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Chapter  1 


Introduction 


1.1  Overview 

Linear  transforms  and  expansions  are  the  fundamental  mathematical  tools  of  signal  process¬ 
ing.  Yet  the  properties  of  linear  expansions  in  the  presence  of  coefficient  quantization  are 
not  yet  fully  understood.  These  properties  are  most  interesting  when  signal  representations 
are  with  respect  to  redundant,  or  overcomplete,  sets  of  vectors.  Exploring  the  effects  of 
quantization  in  overcomplete  linear  expansions  is  the  unifying  theme  of  this  work. 

The  core  problem  of  Chapter  2  is  depicted  in  Figure  1.1.  A  vector  x  £  Kw  is  left  multiplied 
by  a  matrix  F  to  get  y  £  Mm.  For  M  >  N,  we  have  an  overcomplete  expansion.  The  problem 
is  to  estimate  x  from  a  scalar  quantized  version  of  y.  To  put  this  in  a  solid  framework,  we 
introduce  the  concept  of  frames  and  prove  some  properties  of  frames.  We  then  show  that  the 
quality  of  reconstruction  can  be  improved  by  using  deterministic  properties  of  quantization, 
as  opposed  to  considering  quantization  to  be  the  addition  of  noise  that  is  independent  in 
each  dimension. 

In  Chapter  3,  focus  shifts  to  the  problem  of  compression,  i.e.  finding  efficient  represen¬ 
tations.  Vector  quantization  and  transform  coding  are  the  standard  methods  used  in  signal 
compression.  Vector  quantization  gives  better  rate-distortion  performance,  but  it  is  difficult 
to  implement  and  is  computationally  expensive.  The  computational  aspects  make  transform 
coding  very  attractive.  For  this  reason,  transform  coding  is  ubiquitous  in  image  compression. 

For  fine  quantization  of  a  Gaussian  signal  with  known  statistics,  the  Karhunen-Loeve 
transform  (KLT)  is  optimal  for  transform  coding  [13].  In  general,  signal  statistics  are  chang¬ 
ing  or  not  known  a  priori.  Thus,  one  must  either  estimate  the  KLT  from  finite  length  blocks 
of  the  signal  or  use  a  fixed,  signal-independent  transform.  The  former  case  is  computa¬ 
tionally  intensive  and  transmission  of  the  KLT  coefficients  can  be  prohibitively  expensive.1 
The  latter  option  is  most  commonly  used,  often  with  the  discrete  cosine  transform  (DCT). 
As  with  any  fixed  transform,  the  DCT  is  nearly  optimal  for  only  a  certain  set  of  possible 
signals.  There  has  been  considerable  work  in  the  area  of  adaptively  choosing  a  transform 
from  a  library  of  orthogonal  transforms,  for  example,  using  wavelet  packets  [29]. 

All  varieties  of  transform  coding  represent  a  signal  vector  as  a  linear  combination  of 

Tn  practical  adaptive  transform  coding  system,  20  to  40  percent  of  the  available  bit  rate  is  assigned  to 
side  information  [21,  §2.3]. 
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x  e  9tN  y  e  9tM  y  e  9tM  x  e  9tN 


Figure  1.1:  Block  diagram  of  reconstruction  from  quantized  frame  expansion. 


basis  vectors.  Notice  that  in  Figure  1.1,  y  is  a  representation  of  x  in  terms  of  the  rows 
of  F.  But  y  is  generally  not  an  efficient  representation  of  x.  A  method  that  adaptively 
chooses  a  basis  set  from  a  finite  dictionary  given  a  signal  vector  is  presented  in  Chapter  3. 
The  representation  is  generated  through  a  greedy  successive  approximation  algorithm  called 
matching  pursuit.  Much  as  the  KLT  finds  the  best  representation  “on  average,”  this  method 
finds  a  good  representation  for  the  particular  vector  being  coded.  Since  it  does  not  depend 
on  distributional  knowledge,  matching  pursuit  can  be  viewed  as  a  “universal  transform”  for 
transform  coding.2 

Some  of  the  results  of  this  report  appeared  earlier  in  [14]. 


2 The  phrase  “universal  lossy  coder”  is  avoided  because  we  assume  a  separation  into  a  transform,  fol¬ 
lowed  by  scalar  quantization  and  universal  lossless  coding.  This  separation  is  not  necessarily  optimal  but  is 
motivated  by  complexity  considerations. 
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1.2  Notation 

The  notation  used  throughout  the  report  is  summarized  in  Table  1.1  below. 


Symbol 

Definition 

Reference 

Conjugation 

* 

Conjugate  transpose 

1  '  1 

Cardinality  of  a  set 

Inner  product;  for  finite  dimensional  vectors,  (x,  y)  =  xTy 

ll’  II 

Norm  (derived  from  inner  product  through  x  =  (x,  x}1/2) 

Ran(-) 

Range  of  an  operator 

A  coefficient  in  a  linear  expansion 

§3.1,  §3.2.1 

A 

A  lattice 

Appendix  C 

$ 

A  frame  in  H 

§2.1.1 

$ 

The  dual  frame  of  $ 

§2.2.1 

Pk 

An  element  of  $ 

§2.1.1 

<fi 

An  element  of  $ 

§2.2.1 

A 

Lower  frame  bound 

§2.1.1 

B 

Upper  frame  bound 

§2.1.1 

C 

Complex  numbers 

V 

A  dictionary  in  an  adaptive  expansion 

§3.1,  §3.2.1 

E[] 

Expectation  operator 

F 

Frame  operator  associated  with  $ 

§2.1.1 

H 

Hilbert  space  RlN  or  CN 

In 

n  X  n  identity  matrix  (n  is  omitted  if  it  is  clear  from  context) 

J 

UU 

I< 

A  countable  index  set 

L2(R) 

Space  of  square-integrable  functions  over  M 

§2.1.2 

HU 

Space  of  square-summable  sequences  indexed  by  K 

§2.1.1 

M 

Cardinality  of  $  or  I) 

§2.1.1,  §3.2.1 

N 

Dimension  of  H 

A) 

Normal  distribution  with  mean  y  and  covariance  matrix  A 

M 

Real  numbers 

r 

M/N}  the  redundancy  of  $  or  I) 

§2.1.1 

TL 

Integers 

lA 

Positive  integers 

■ 

End  of  a  proof 

□ 

End  of  an  example 

Table  1.1:  Summary  of  notation 


Chapter  2 

Non-adaptive  Expansions 


Orthogonal  transforms  are  ubiquitous  in  mathematics,  science,  and  engineering.  The  basis 
functions  used  in  these  transforms  do  not  depend  on  the  particular  signal  being  analyzed 
and  hence  the  resulting  expansions  can  be  considered  non-adaptive. 

For  electrical  engineers,  frequency  domain  techniques  based  on  Fourier  transforms  and 
Fourier  series  are  second-nature.  This  chapter  describes  frames,  which  provide  a  general 
framework  for  understanding  non-orthogonal  transforms.  Frames  were  introduced  by  Dufhn 
and  Schaeffer  [10]  in  the  context  of  non-harmonic  Fourier  series.  Recent  interest  in  frames 
has  been  spurred  by  its  utility  in  analyzing  discrete  wavelet  transforms  [5,  6,  15]  and  time- 
frequency  decompositions  [22],  We  are  motivated  by  a  desire  to  understand  quantization 
effects  and  efficient  representations  in  a  general  framework. 

To  put  this  chapter  in  context,  we  will  give  a  particular  interpretation  of  Fourier  analysis 
and  discuss  a  sense  in  which  it  can  be  generalized.  Since  we  are  limiting  our  attention 
to  finite  dimensional  spaces,  consider  the  Discrete  Fourier  Transform  (DFT)  of  a  length- N 
sequence  x\n\.  We  can  interpret  the  DFT  as  giving  a  set  of  N  coefficients1 


JV-i  i  /  1  \ 

XW  =  £  -7T7*M  e-j2*kn/N  =  /  ej2 *kn/N  \  for  Q  <  fc  <  N  -  1 .  (2.1) 

n= 0  V N  \  V  N  / 


Then  the  original  sequence  can  be  reconstructed  as 


1 


x  n 


=  £7F™ 


0j27rkn/ N  _ 


!  1  \  i 

x[n\,  eJ2irWJV  \  e 3^kn/N  for  o  <  n  <  N  -  1 . 

\  WN  /  y/N  ~  ~ 

(2-2) 

In  this  manner,  the  DFT  gives  a  linear  expansion  of  a  vector  in  terms  of  the  set  of  vectors 


=  £ 

k= 0 


1  1 


l  T 


0j27rkJN 


y/N  y/N 


y/N 


Cjj2'irk(N — 1)  jN 


N- 1 


k= 0 


(2.3) 


where  the  coefficients  in  the  expansion  are  formed  by  taking  inner  products  with  the  same 
set.  Similar  expansions  can  be  found  by  replacing  (2.3)  by  other  sets  of  vectors.  Note  that 

ffihe  term  that  generally  appears  in  the  inverse  DFT  formula  has  been  distributed  between  the  DFT 
and  the  inverse  DFT.  This  is  gives  unit-norm  basis  vectors  for  analysis  and  synthesis. 
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the  set  need  not  have  only  N  elements  and  that  the  sets  used  in  analysis  (2.1)  and  synthesis 
(2.2)  may  be  different.  We  will  see  in  §2.2.1  that  the  analysis  and  synthesis  sets  must  be 
dual  frames. 

Section  2.1  begins  with  definitions  that  will  be  used  throughout  the  chapter  and  examples 
of  frames.  It  concludes  with  a  theorem  on  the  tightness  of  random  frames  and  a  discussion 
of  that  result.  Section  2.2  begins  with  a  review  of  reconstruction  from  exactly  known  frame 
coefficients.  The  remainder  of  the  section  gives  new  results  on  reconstruction  from  quantized 
frame  coefficients.  Most  previous  work  on  frame  expansions  is  predicated  either  on  exact 
knowledge  of  coefficients  or  on  coefficient  degradation  by  white  additive  noise.  For  example, 
Munch  [22]  considered  a  particular  type  of  frame  and  assumed  the  coefficients  were  subject 
to  a  stationary  noise.  This  report,  on  the  other  hand,  is  in  the  same  spirit  as  [4,  32,  33,  35] 
in  that  it  utilizes  the  deterministic  qualities  of  quantization. 


2.1  Frames 

2.1.1  Definitions  and  Basics 

The  material  in  this  subsection  is  largely  adapted  from  [6,  Ch.  3].  We  are  limiting  our 
attention  to  Hilbert  spaces  H  of  dimension  N. 

DEFINITION.  Let  $  =  {<fk}keK  C  H,  where  K  is  a  countable  index  set.  $  is  called  a  frame 
if  there  exist  A  >  0  and  B  <  oo  such  that  for  all  /  £  H , 

.4||/H2<i:i(/.^)|2<B||/l|2.  (2.4) 

keK 


A  and  B  are  called  the  frame  bounds. 

Throughout  we  will  denote  |/T|,  the  cardinality  of  K,  by  M  and  allow  M  =  oo.  The 
lower  bound  in  (2.4)  is  equivalent  to  requiring  that  $  span  H .  Thus  a  frame  will  always 
have  M  >  N.  We  will  refer  to  r  =  as  the  redundancy  of  the  frame.  Also  notice  that  one 
can  choose  B  =  ffkeK  ||pfc||2  whenever  M  <  oo. 

DEFINITION.  Let  $  be  a  frame  in  H.  $  is  called  a  tight  frame  if  the  frame  bounds  can  be 
taken  to  be  equal. 

It  is  easy  to  verify  that  if  $  is  a  tight  frame  with  ||(/?fc||  =  1  for  all  k  £  K,  then  A  =  r. 

PROPOSITION  2.1.  Let  $  =  {<fk}keK  be  a  tight  frame  with  frame  bounds  A  =  B  =  1.  If 
||(/?fc||  =  1  for  all  k  £  K,  then  $  is  an  orthonormal  basis. 

Proof:  See  §A.2.  ■ 

DEFINITION.  Let  $  =  {<fk}keK  be  a  frame  in  H .  The  frame  operator  F  is  the  linear  operator 
from  H  to  CM  defined  by2 

(Ff)k  =  {f ,  Pk).  (2.5) 

2  We  should  denote  the  codomain  by  A  (A')  to  properly  include  the  case  M  =  oo;  however,  for  notational 
simplicity  we  will  not. 
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Note  that  when  H  is  finite  dimensional,  this  operation  is  a  matrix  multiplication  where  F  is 
a  matrix  with  kth  row  equal  to  <pk.  Using  the  frame  operator,  (2.4)  can  be  rewritten  as 

AIn  <  F* F  <  (2-6) 

where  In  is  the  N  X  N  identity  matrix.  (The  matrix  inequality  AIn  <  F*F  means  that 
F*F  —  AIn  is  a  positive  semidehnite  matrix.)  In  this  notation,  F*F  =  AIn  implies  that  $ 
is  a  tight  frame. 

From  (2.6)  we  can  immediately  conclude  that  the  eigenvalues  of  F*F  lie  in  the  interval 
[A,  B];  in  the  tight  frame  case,  all  of  the  eigenvalues  are  equal.  This  gives  a  computational 
procedure  for  finding  frame  bounds.  Since  it  is  conventional  to  assume  A  is  chosen  as  large 
as  possible  and  B  is  chosen  as  small  as  possible,  we  will  sometimes  take  the  minimum  and 
maximum  eigenvalues  of  F*F  to  be  the  frame  bounds.  Note  that  it  also  follows  from  (2.6) 
that  F*F  is  invertible  because  all  of  its  eigenvalues  are  nonzero. 

Let  $  =  {(fk}kei<  be  a  frame  in  H .  Since  Span($)  =  H ,  any  vector  /  £  H  can  be  written 
as 

/  =  E  (2.7) 

keK 

for  some  set  of  coefficients  { a.k }  C  M.  If  M  >  N}  { a.k }  may  not  be  unique.  We  refer  to  (2.7) 
as  a  redundant  representation  even  though  it  is  not  necessary  that  more  than  N  of  the  a^s 
be  nonzero. 


2.1.2  Examples 


The  question  of  whether  a  set  of  vectors  form  a  frame  is  not  very  interesting  in  a  finite¬ 
dimensional  space;  any  finite  set  of  vectors  which  span  the  space  form  a  frame.  Thus  if 
M  >  N  vectors  are  chosen  randomly  with  a  circularly  symmetric  distribution  on  H ,  they 
almost  surely  form  a  frame.  An  infinite  set  in  a  finite-dimensional  space  can  form  a  frame 
only  if  the  norms  of  the  elements  decay  appropriately,  for  otherwise  a  finite  upper  frame 
bound  will  not  exist. 

Heuristically,  we  expect  tight  frames  to  have  a  certain  degree  of  uniformity  or  regularity. 
This  is  illustrated  by  the  following  examples. 

Example  1  [6].  In  H  =  M2,  let  (px  =  [0  1]T,  (p2  =  [— ^  —  f]T,  and  ip3  =  —  |]T. 

These  are  vectors  on  the  unit  circle  uniformly  spaced  by  120°.  For  any  /  =  [fi  f2]T  G  77, 


I]  l(/,  <Pk) I2 


k= 1 


\M2  + 


V3,  1 

2 

^3,  1 

2  1  2  2 

+ 

2  1  2  2 

3 

2 


2 


Thus  {(/?!,  <p2,  F3}  is  a  tight  frame  with  frame  bound  |  =  □ 

EXAMPLE  2  [37].  Consider  the  space  of  continuous-time  signals  that  are  bandlimited  to 
[ — 7r ,  7r] .  This  is  a  subspace  of  the  Hilbert  space  T2(m).  By  the  Nyquist  Sampling  Theorem 
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[26,  §3.2],  Si  =  {sinc(t  -  £;)}fcez,  where 


sinc(t) 


sin(7rt) 

nt 


1 


forms  a  basis  for  this  space.  Notice  that  Si  is  the  basis  set  for  ideal  7r-bandlimited  interpo¬ 
lation.  For  n  G  zA,  the  set 


Sn  =  <  sinc(t - ) 

n 


keh 


forms  a  tight  frame  with  redundancy  n.  An  expansion  with  respect  to  Sn  corresponds  to 
sampling  at  n  times  the  Nyquist  rate.  □ 


EXAMPLE  3.  Oversampling  of  a  periodic,  bandlimited  signal  can  be  viewed  as  a  frame 
operator  applied  to  the  signal,  where  the  frame  operator  is  associated  with  a  tight  frame.  If 
the  samples  are  quantized,  this  is  exactly  the  situation  of  oversampled  A/D  conversion  [33]. 
Let  x  =  [Xi  X2  ■  ■  ■  Xn]t  G  Kw ,  with  N  odd.  Define  a  corresponding  continuous-time 
signal  by 


w 

xc{t)  =  Xi  +  yj 


k= 1 


x2kV% 


cos 


2wkt 

T 


+  X2k+ ia/2  sin 


27 rkt 
T 


(2.8) 


and 


where  W  =  Any  real- valued,  T-periodic,  bandlimited,  continuous-time  signal  can  be 

written  in  this  form.  Let  M  >  N.  Define  a  sampled  version  of  xc(t)  by  xd[m]  =  x 
let 

y  =  [xd[  0]  xd[l\  •••  xd[M  -  1]]T . 


Then  we  have  y  =  Fx,  where 


F 


1  V2  0 

I  y/2  cos  9  \[2  sin  9 


V2 

y/2  cos  W9 


0 

\/2  sin  W9 


•) 


I  v^cos  (M'9)  v^sin  (M'9)  •••  V2cos(WM'9)  y/2sm(WM'9 ) 


(2.9) 


M'  =  M  —  1,  and  9  =  Using  the  orthogonality  properties  of  sine  and  cosine,  it  is  easy 
to  check  that  F*F  =  MU,  so  F  is  an  operator  associated  with  a  tight  frame.  Pairing 
terms  and  using  the  identity  cos2  k9  +  sin2  k9  =  1,  we  find  that  each  row  of  F  has  norm 
y/N.  Dividing  F  by  \/N  normalizes  the  frame  and  results  in  a  frame  bound  equal  to  the 
redundancy  ratio  r.  Also  note  that  r  is  the  oversampling  ratio  with  respect  to  the  Nyquist 
sampling  frequency.  □ 


2.1.3  Tightness  of  Random  Frames 

Tight  frames  constitute  an  important  class  of  frames.  As  we  will  see  in  §2.2.1,  a  tight  frame 
is  self-dual  and  hence  has  some  desirable  reconstruction  properties.  These  reconstruction 
properties  indeed  extend  smoothly  to  nearly  tight  frames,  i.e.  frames  with  ^  close  to  one. 
Also,  for  a  tight  frame  (2.4)  reduces  to  something  similar  to  Parseval’s  equality.  Thus,  a 
tight  frame  operator  scales  the  energy  of  an  input  by  a  constant  factor  A.  Furthermore, 
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it  is  shown  in  §2.2.4  that  some  properties  of  “typical”  frame  operators  depend  only  on  the 
redundancy.  This  motivates  our  interest  in  the  following  theorem. 

Theorem  2.2:  Tightness  of  Random  Frames 

Let  {(&m}<m=n  be  a  sequence  of  frames  in  RN  such  that  is  generated  by  choosing  M 
vectors  independently  with  a  uniform  distribution  on  the  unit  sphere  in  Rw.  Let  Fm  be  the 
frame  operator  associated  with  <&m-  Then,  in  the  mean  squared  sense, 

—  Fm* Fm  — >  — In  elementwise  as  M  — >  oo. 

M  N 

Proof:  See  §A.3.  ■ 


Theorem  2.2  shows  that  a  sequence  of  random  frames  with  increasing  redundancy  will 
approach  a  tight  frame.  Note  that  although  the  proof  in  Appendix  A  uses  an  unrelated 
strategy,  the  constant  1/N  is  intuitive:  If  is  a  tight  frame  with  normalized  elements, 
then  we  have  Fm* Fm  =  77 Av  because  the  frame  bound  equals  the  redundancy  of  the  frame. 

Numerical  experiments  were  performed  to  confirm  this  behavior  and  observe  the  rate 
of  convergence.  Sequences  of  frames  were  generated  by  successively  adding  random  vectors 
(chosen  according  to  the  appropriate  distribution)  to  existing  frames.  Results  shown  in 
Figures  2.1  and  2.2  are  averaged  results  for  200  sequences  of  frames  in  M4.  Figure  2.1  shows 
that  -A  and  converge  to  A.  Figure  2.2  shows  that  -j  converges  to  one. 

In  Theorem  2.2,  the  uniformity  of  the  frame  elements  over  the  unit  sphere  is  a  necessary 
condition.  This  is  illustrated  by  the  following  example. 

EXAMPLE  4.  Suppose  sequences  of  frames  in  M2  is  generated  by  choosing  vectors  <fk  = 
[cos  9  sin0]T,  where  9  is  uniformly  distributed  on  [0,  -|].  Then 


M 


Fm*Fm 


2 

j_ 

L  2  TT 


27 r 

1 


elementwise  as  M  — >■  00. 


Thus  the  sequence  of  frames  does  not  approach  a  tight  frame.  We  can  make  a  few  additional 
observations.  The  eigenvalues  of 

1  _j_ 

2  2  TT 

_j_  1 

.2  TT  2  . 

are  Ai  =  |(1  +  A)  and  A2  =  |(1  —  ^),  with  corresponding  eigenvectors  /1  =  -^[1  1]T  and 
/2  =  ^7j[l  —  1  ]T,  respectively.  The  eigenvectors  /1  and  /2  are  the  vectors  that  maximize 

and  minimize,  respectively, 


E 


-  M 

Y,  I  (/,  v>k)  l2 


U=i  J 

over  all  unit-norm  /.  This  example  reinforces  the  notion  that  tightness  of  frames  corresponds 
to  directional  uniformity.  □ 


2.2  Reconstruction  from  Frame  Coefficients 

At  this  point,  the  usage  of  frames  in  signal  analysis  is  not  yet  justified  because  we  have  not 
considered  the  problem  of  reconstructing  from  frame  coefficients. 
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In  §2.2.1,  we  review  the  basic  properties  of  reconstructing  from  (unquantized)  frame  coef¬ 
ficients.  This  material  is  adapted  from  [6].  The  subsequent  sections  consider  the  problem  of 
reconstructing  an  estimate  of  an  original  signal  from  quantized  frame  coefficients.  Classical 
methods  are  limited  by  the  assumption  that  the  quantization  noise  is  white.  Our  approach 
uses  deterministic  qualities  of  quantization  to  arrive  at  the  concept  of  consistent  reconstruc¬ 
tion.  Consistent  reconstruction  methods  yield  smaller  reconstruction  errors  than  classical 
methods. 

2.2.1  Unquantized  Case 

Let  $  be  a  frame,  assuming  the  notation  of  §2.1.1.  In  this  subsection,  we  consider  the 
problem  of  recovering  /  from  {(/,  ipk)}keK- 

Recall  that  F*F  is  invertible.  We  can  say  furthermore  that 

B~lIN  <  (F*F)~1  <  A~lIN.  (2.10) 

DEFINITION.  The  dual  frame  of  $  is  $  =  {tpk}keK,  where 

^  =  (F*F)"Vfc,  VkeK.  (2.11) 

For  a  tight  frame,  (2.11)  simplifies  to 

<pk  =  A~lLpk ,  V  k  G  K.  (2.12) 

PROPOSITION  2.3.  $  is  a  frame  with  frame  bounds  B~l  and  A-1,  i.e. 

s-iii/ii2<Ei(/-®r<-4-iii/ii2. 

keK 

The  associated  frame  operator  F  :  H  — >■  CM  satisfies  F  =  F(F*F)~1,  F*F  =  (F*F)~1,  and 
F*F  =  /jv  =  F*F.  Also,  FF*  =  F F*  is  the  orthogonal  projection  operator,  in  CM ,  onto 
Ran(F)  =  Ran(F). 

Proof:  See  [6,  p.  59].  ■ 

A  consequence  of  F*F  =  (F*F)~1  is  that  the  dual  of  $  is  $.  Another  of  the  conclusions 
of  Proposition  2.3  gives  us  the  desired  reconstruction  formula:  Namely,  F*F  =  /jv  implies 

/  =  F*F  f  =  J2(f^k)pk.  (2.13) 

keK 

This  formula  is  reminiscent  of  (2.2).  The  difference  is  that  in  (2.2),  one  set  of  vectors  plays 
the  roles  of  both  $  and  $.  This  is  because  the  set  in  (2.3)  is  a  tight  frame  in  CN.  In  analogy 
to  (2.13),  since  F*F  =  /jv,  we  can  also  write 

/  =  F*F  f  =  J2(f,PkWk.  (2.14) 

keK 

Comparing  (2.13)  and  (2.14)  emphasizes  the  “dual”  nature  of  $  and  $. 
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The  derivation  of  (2.14)  obscures  the  fact  that,  when  M  >  N}  there  should  be  many 
ways  to  write  /  as  a  linear  combination  of  vectors  from  $.  After  all,  there  is  an  IV-element 
subset  of  $  that  spans  H.  What  is  special  about  the  expansion  in  (2.14)?  This  question  is 
partially  answered  by  the  following  proposition. 

PROPOSITION  2.4.  If  /  =  J2kei<  ckPk  for  some  set  of  coefficients  {ck}keKi  then 

£  kl2>  £  l</,  Pk}\\  (2.15) 

keK  keK 

with  equality  only  if  c ^  =  (/,  <fk)  for  all  k  £  K . 

Proof:  See  [6,  p.  61].  ■ 

The  norm-minimizing  property  of  (2.15)  holds  in  the  “dual”  sense  also:  If 

/  =  £(/,  Pk)uk, 

keK 

then 

£  I (uk,  g) |2  >  £  \{pk,  g) |2 

keK  keK 

for  all  g  £  H.  Also,  using  (2.13)  has  advantages  over  other  possible  reconstruction  formulas 
when  the  frame  coefficients  are  not  known  exactly  (see  §2.2.2). 

Sometimes  we  can  reconstruct,  or  approximately  reconstruct,  without  explicitly  finding 
the  dual  frame  through  (2.11).  For  example,  if  $  is  a  tight  frame,  by  substituting  (2.12) 
into  (2.13),  we  can  write  /  =  A-1  YlkeKifi  Pk)Pk-  It  is  interesting  to  see  how  this  extends 
smoothly  to  the  case  that  $  is  close  to  tight,  i.e.  A  is  close  to  B. 

Let  p  =  -j  —  1.  If  0  <  /?  <C  1,  F*F  ss  so  (F*F)~1  ss  In-  Precisely, 

/  =  A  2  p  £  (/,  Tk)Pk  +  Rf ,  (2.16) 

A  +  B  hcK 


where  R  =  /jy  —  A+B  F*F.  (This  is  valid  for  any  p.)  Let 

2 

f°  =  A  +  B 

'  keK 


(2.17) 


It  can  be  shown  that  ||i?||  <  therefore  \\f  —  f0\\  <  y(y||/||,  so  (2.17)  gives  an 

estimate  for  /  with  bounded  error.  The  iteration 


fn  —  fn-l  T  ,  D  [{f  1  Tk)  {fn—1 1  Pk)]  Pk 

A+B  keK 


gives  a  sequence  of  estimates  satisfying 


11/ -All  < 


(2.18) 


The  dependence  on  p  in  (2.18)  shows  that  for  a  fixed  error  tolerance,  less  computation  is 
required  for  reconstruction  in  a  tight  or  nearly  tight  frame. 
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2.2.2  Classical  Method 

We  now  turn  to  the  question  of  reconstructing  when  the  frame  coefficients  {(/,  <fk)}keK  are 
degraded  in  some  way.  Any  mode  of  degradation  is  possible,  but  the  most  practical  situations 
are  additive  noise  due  to  measurement  error  or  quantization.  We  are  most  interested  in  the 
latter  case  because  of  its  implications  for  efficient  storage  and  transmission  of  information. 

Suppose  we  wish  to  approximate  /  given  F f  +  (3,  where  (3  £  CM  is  a  zero-mean  noise, 
uncorrelated  with  /.  The  key  to  finding  the  best  approximation  is  that  FH  =  Ran(F)  is 
an  /V-dimensional  subspace  of  CM.  Hence  the  component  of  (3  perpendicular  to  FH  should 
not  hinder  our  approximation,  and  the  best  approximation  is  the  projection  of  F f  +  (3  onto 
Ran(F).  By  Proposition  2.3,  this  approximation  is  given  by 

f  =  F*(Ff  +  (3).  (2.19) 


Furthermore,  because  the  component  of  (3  orthogonal  to  Ran(F)  does  not  contribute,  we 
expect  ||/  —  /||  =  ||F*/3||  to  be  smaller  than  ||/3 1| . 

To  make  this  more  precise,  recall  Example  1  of  §2.1.2.  If  (3  =  [//  (32  /?3] T ,  where  the  /3/s 
are  independent  random  variables  with  mean  zero  and  variance  <r2, 


E 


f  —  F*(F  f  +  (3) 


=  E 


=  E 


/-fE((/>  Pk)+f3k)pk 

6  k= i 


2  3 

o  E  ftkPk 

6  k= i 


4  /  3  3  \ 

=  ^(EE 


—  q-E1  {pl  +  02  +  fit  ~  /1/2  —  (32(3s  —  /?i/33)  —  xcr2- 


Here  we  have  used  the  fact  that 


T  [  1  k=£ 

TkTt  ~  |  _i  k^£  ■ 

Notice  that  this  mean-squared  error  (MSE)  is  |  of  the  2a2  MSE  that  would  appear  in  an 
orthogonal  basis  representation.  The  MSE  reduction  is  by  a  factor  of  1  /r,  where  r  is  the 
redundancy  of  the  tight  frame.  Having  0(1 /r)  MSE  behavior  is  a  general  phenomenon 
for  reconstruction  by  projection  in  a  tight  frame  representation.  It  is  a  special  case  of  the 
following  proposition. 

Proposition  2.5:  Noise  Reduction  in  Classical  Reconstruction 
Let  $  =  be  a  frame  of  unit-norm  vectors  with  associated  frame  operator  F  and  let 

(3  =  \j3i  (32  ■  ■  ■  /3m]T ,  where  the  /3/s  are  independent  random  variables  with  mean  zero  and 
variance  cr2.  Then  the  MSE  of  the  classical  reconstruction  (2.19)  satisfies 


MSE  < 


Ma2 
A2  ' 


(2.20) 
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Furthermore,  if  the  frame  is  tight,  (2.20)  holds  with  equality,  giving 


MSE 


N2a2 

M 


Na2 

r 


(2.21) 


Proof:  See  §A.4.  ■ 

Now  consider  the  case  where  the  degradation  is  due  to  quantization.  Let  x  £  Kw  and 
y  =  Fx,  where  F  £  RMxN  is  a  frame  operator.  Suppose  y  =  Q(y),  where  Q  :  Mm  — >  Mm  is 
a  scalar  quantization  function,  i.e.  Q(y)  =  [q(yi)  5(2/2)  •  •  •  5(j/m)]T,  where  q  :  M  — >  M  is  a 
scalar  quantization  function. 

One  approach  to  approximating  x  given  y  is  to  treat  the  quantization  noise  y  —  y  as  ran¬ 
dom,  independent  in  each  dimension,  and  uncorrelated  with  y.  These  assumptions  make  the 
problem  tractable  using  statistical  techniques.  The  problem  reduces  to  the  previous  prob¬ 
lem,  and  x  =  F*y  is  the  best  approximation.  Strictly  speaking,  however,  the  assumptions 
on  which  this  reconstruction  is  based  are  not  valid  because  y  —  y  is  a  deterministic  quantity 
depending  on  y,  with  interplay  between  the  components. 

2.2.3  Consistent  Reconstruction 

The  shortcoming  of  the  classical  reconstruction  method  is  that  it  disregards  deterministic 
properties  of  quantization.  As  a  result,  the  reconstruction  may  have  a  different  quantized 
value  than  the  original.  Using  the  term  introduced  by  Thao  and  Vetterli  [33],  we  say  that 
the  reconstruction  may  be  inconsistent. 

DEFINITION.  We  say  that  x  is  a  consistent  estimate  of  x  or  a  consistent  reconstruction  if 
Q(Fx )  =  Q(Fx).  A  reconstruction  that  is  not  consistent  is  said  to  be  inconsistent. 

In  words,  we  would  say  that  an  estimate  is  consistent  if  it  is  the  same  as  its  quantized 
version.  Another  way  to  understand  consistency  is  in  terms  of  partitions.  Q  induces  a 
partitioning  of  Mm.  (We  can  temporarily  remove  the  restriction  that  Q  is  a  scalar  quantizer 
and  require  only  that  the  partition  regions  are  convex.)  This  quantization  also  induces  a 
partitioning  of  RN  through  the  inverse  image  of  Q  o  F.  The  partition  of  RN  can  be  viewed  in 
another  way:  Since  Q  partitions  Mm,  it  also  partitions  the  IV-dimensional  subspace  F(Rn). 
Mapping  back  to  RN  using  F*  gives  the  partition  of  RN  induced  by  Q.  A  consistent  estimate 
is  simply  one  that  falls  in  the  same  partition  region  as  the  original. 

All  of  these  concepts  are  illustrated  for  N  =  2  and  M  =  3  in  Figure  2.3.  The  ambi¬ 
ent  space  is  Mm.  The  cube  represents  the  partition  region  in  Mm  containing  y  =  Fx  and 
has  codebook  value  y.  The  plane  is  F(Rn)  and  hence  is  the  subspace  within  which  any 
unquantized  value  must  lie.  The  intersection  of  the  plane  with  the  cube  gives  the  shaded 
triangle  within  which  a  consistent  estimate  must  lie.  Projecting  to  F(Mn),  as  in  the  classical 
reconstruction  method,  removes  the  out-of-subspace  component  of  y  —  y.  As  illustrated,  this 
type  of  reconstruction  is  not  necessarily  consistent.  For  further  geometric  interpretation  of 
quantized  frame  expansions,  refer  to  Appendix  B. 

With  no  assumptions  on  Q  other  than  that  the  partition  regions  be  convex,  a  consistent 
estimate  can  be  determined  using  the  projection  onto  convex  sets  (POCS)  algorithm.  In  this 
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Figure  2.3:  Illustration  of  consistent  reconstruction 


case  that  implies  generating  a  sequence  of  estimates  by  alternately  projecting  on  F(Rn)  and 

Q~Hy)- 

When  Q  is  a  scalar  quantizer,  a  linear  program  can  be  used  to  find  consistent  estimates. 
For  i  =  1,2,...,  M,  denote  the  quantization  stepsize  in  the  zth  component  by  A;.  For 
notational  convenience,  assume  that  the  reproduction  values  lie  halfway  between  decision 
levels.  Then  for  each  z,  |y;  —  y8j  <  To  obtain  a  consistent  estimate,  for  each  i  we  must 
have 

\{Fx)t  -  iji\  < 

Expanding  the  absolute  value,  we  find  the  constraints 


Fx  <  —A  +  y  and  Fx  >  — —A  +  y 


where  A  =  [Ai  A2  .  .  .  A m]T ,  and  the  inequalities  are  elementwise.  These  inequalities  can 
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be  combined  into 


(2.22) 


The  formulation  (2.22)  shows  that  x  can  be  determined  through  linear  programming  [31]. 
The  feasible  set  of  the  linear  program  is  exactly  the  set  of  consistent  estimates,  so  an  arbitrary 
cost  function  can  be  used. 

A  linear  program  always  returns  a  corner  of  the  feasible  set  [31,  §8.1],  so  this  type  of 
reconstruction  will  not  be  close  to  the  centroid  of  the  partition  cell.  Since  the  cells  are 
convex,  one  could  use  several  cost  functions  to  (presumably)  get  different  corners  of  the 
feasible  set  and  average  the  results.  Another  approach  is  to  use  a  quadratic  cost  function 
equal  to  the  distance  from  the  projection  estimate  given  by  (2.19).  Both  of  these  methods 
will  reduce  the  MSE  by  a  constant  factor.  They  do  not  change  the  asymptotic  behavior  of 
the  MSE  as  the  redundancy  r  is  increased. 


2.2.4  Error  Bounds 

In  this  subsection,  we  concern  ourselves  with  bounds  on  the  MSE  in  estimating  x  from  y. 
Our  fundamental  premise  is  that  any  reconstruction  method  that  gives  consistent  estimates 
is  asymptotically  (in  the  redundancy  r)  optimal.  We  now  prove  two  bounds  that  support 
this  conviction:  first,  an  0(l/r2)  MSE  lower  bound  for  any  reconstruction  algorithm;  and 
second,  an  0(l/r2)  MSE  upper  bound  for  consistent  reconstruction.  Since  we  are  varying  r, 
we  must  consider  sequences  of  frames  with  growing  redundancy. 

Theorem  2.6:  MSE  Lower  Bound 

For  any  set  of  quantized  frame  expansions,  any  reconstruction  algorithm  will  yield  an  MSE 
that  can  be  lower  bounded  by  an  0(l/r2)  expression.3 

PROOF:  The  proof  of  this  general  result  is  given  under  the  guise  of  a  more  restricted  result 
in  [35].  There  it  is  proven  that  when  the  frame  operators  correspond  to  oversampled  A/D 
conversion  (see  §2.1.2),  any  reconstruction  algorithm  will  yield  an  MSE  that  can  be  lower 
bounded  by  an  0(l/r2)  expression.  The  proof  is  based  on  counting  the  number  of  cells  in 
the  partition  of  RN  and  using  Zador’s  formula.  The  only  frame-specific  property  that  is  used 
corresponds  to  requiring  that  elements  of  the  frame  not  be  parallel.  Having  parallel  frame 
elements  would  reduce  the  number  of  cells  in  the  partition  and  hence  increase  the  MSE. 
Therefore  the  proof  extends  to  the  general  case.  ■ 

Proposition  2.7:  MSE  Upper  Bound  (Restricted  Case) 

Let  x  be  such  that  it  has  a  probability  density.4  Consider  quantized  frame  expansions  of 
x  with  frame  corresponding  to  the  frame  operator  (2.9)  and  quantization  stepsize  A.  For 
sufficiently  small  (fixed)  A,  a  consistent  reconstruction  algorithm  will  yield  an  MSE  that  can 
be  upper  bounded  by  an  0(l/r2)  expression. 

PROOF:  The  proof  is  based  on  a  correspondence  between  vectors  in  RN  and  periodic,  ban- 
dlimited,  continuous-time  signals.  Let  xcit)  be  defined  as  in  (2.8),  where  T  is  arbitrary.  Then 

3Actually,  we  must  exclude  the  case  where  x  has  a  degenerate  distribution  that  allows  for  perfect  recon¬ 
struction.  This  point  is  not  emphasized  in  [35]. 

4This  is  to  eliminate  degenerate  distributions  for  x. 
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quantized  frame  expansion  of  x  is  equivalent  to  oversampled  A/D  conversion  of  xc(t).  Ac¬ 
cording  to  Thao  and  Vetterli  [33,  Thm.  4.1],  the  MSE  can  be  upper  bounded  by  an  0(l/r2) 
expression.  One  requirement  in  applying  their  result  is  that  xcit)  must  have  sufficient  quan¬ 
tization  threshold  crossings.  In  our  more  general  framework,  this  corresponds  to  requiring 
that  the  distribution  of  x  not  be  overly  concentrated  inside  a  sphere  of  radius  A.5  Since  x 
has  a  probability  density,  this  can  be  assured  by  choosing  A  sufficiently  small.  ■ 

Conjecture  2.8:  MSE  Upper  Bound 

Under  very  general  conditions,  for  any  set  of  quantized  frame  expansions,  any  algorithm 
that  gives  consistent  estimates  will  yield  an  MSE  that  can  be  upper  bounded  by  an  0(l/r2) 
expression. 

For  this  general  upper  bound  to  hold,  some  sort  of  non-degeneracy  condition  is  required 
because  we  can  easily  construct  a  sequence  of  frames  with  increasing  r  for  which  the  frame 
coefficients  give  no  additional  information  as  r  is  increased.  For  example,  we  can  start  with 
an  orthonormal  basis  and  increase  r  by  adding  copies  of  vectors  already  in  the  frame.  Putting 
aside  pathological  cases,  simulations  for  quantization  of  a  source  uniformly  distributed  on 
[— 1,  1]^  support  this  conjecture.  Simulations  were  performed  with  three  types  of  frame 
sequences: 

I.  A  sequence  of  frames  corresponding  to  oversampled  A/D  conversion,  as  given  by 
(2.9).  This  is  the  case  in  which  we  have  a  provable  0(l/r2)  MSE  upper  bound. 

II.  For  N  =  3,4,  and  5,  Hardin,  Sloane  and  Smith  have  numerically  found  arrange¬ 
ments  of  up  to  130  points  on  TV-dimensional  unit  spheres  that  maximize  the 
minimum  Euclidean  norm  separation  [16]. 

III.  Frames  generated  by  randomly  choosing  points  on  the  unit  sphere  according  to 
a  uniform  distribution. 

Simulation  results  are  given  in  Figure  2.4.  The  dashed,  dotted,  and  solid  curves  correspond 
to  frame  types  I,  II,  and  III,  respectively.  The  data  points  marked  with  +’s  correspond 
to  using  a  linear  program  based  on  (2.22)  to  fold  consistent  estimates.  The  data  points 
marked  with  o’s  correspond  to  classical  reconstruction.  The  important  characteristics  of 
the  graph  are  the  slopes  of  the  curves.  Note  that  0(l/r)  MSE  corresponds  to  a  slope  of 
-3.01  dB/octave  and  0(l/r2)  MSE  corresponds  to  a  slope  of -6.02  dB/octave.  The  consistent 
reconstruction  algorithm  exhibits  0(l/r2)  MSE  for  each  of  the  types  of  frames.  The  classical 
method  exhibits  0(l/r)  MSE  behavior,  as  expected.  It  is  particularly  interesting  to  note 
that  the  performance  with  random  frames  is  as  good  as  with  either  of  the  other  two  types 
of  frames. 

Note  that  in  light  of  Theorem  2.2,  it  may  be  useful  to  try  to  prove  Conjecture  2.8  only 
for  tight  frames. 

5 In  most  cases,  we  assume  quantizer  offsets  such  that  zero  is  either  a  reconstruction  value  or  a  boundary 
value.  By  randomizing  the  quantizer  offset,  we  can  remove  this  condition. 
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Figure  2.4:  Experimental  results  for  reconstruction  from  quantized  frame  expansions.  Shows 
0(1  fr2)  MSE  for  consistent  reconstruction  and  0(1/?’)  MSE  for  classical  reconstruction. 

2.2.5  Rate-Distortion  Tradeoffs 

Our  discussion  of  quantized  frame  expansions  lias  focused  on  expected  distortion  without 
concern  for  rate.  In  this  subsection  we  begin  consideration  of  rate-distortion  tradeoffs. 

We  have  demonstrated  that  optimal  reconstruction  techniques  give  an  MSE  proportional 
to  l/?’2.  It  is  well  known  that  in  orthogonal  representations  the  MSE  is  proportional  to 
A2.  This  extends  to  the  frame  case  as  well.  Thus  we  have  two  ways  to  reduce  the  MSE  by 
approximately  a  factor  of  four: 

•  double  r; 

•  halve  A. 

A  priori ,  there  is  no  reason  to  think  that  these  options  each  have  the  same  effect  on  the 
rate.  As  the  simplest  possible  case,  suppose  a  frame  expansion  is  stored  (or  transmitted) 
as  M  5-bit  numbers,  for  a  total  rate  of  MB  bits  per  sample.  Doubling  r  gives  2 M  5-bit 
numbers,  for  a  total  rate  of  2 MB  bits  per  sample.  On  the  other  hand,  halving  A  results  in 
M  (5  +  l)-bit  numbers  for  a  rat e  of  only  M(B  +  1)  bits  per  sample. 

This  argument  suggests  that  halving  A  is  always  the  better  option,  but  a  few  comments 
are  in  order.  One  caveat  is  that  in  some  situations,  doubling  r  and  halving  A  may  have 
very  different  costs.  For  examples  in  oversampled  A/D  conversion,  the  monetary  cost  of 
halving  A  is  much  higher  than  that  of  doubling  r  because  it  requires  precision  trimmed 
analog  electronics.  This  is  a  major  motivating  factor  fop  oversampling. 
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Also,  if  r  is  doubled,  storing  the  result  as  2 M  5-bit  values  is  far  from  the  best  thing  to 
do.  This  is  because  many  of  the  M  additional  numbers  give  little  or  no  information  on  x. 
A  conclusion  of  Zamir  and  Feder  [38]  was  described  by  Zamir  as,  “one  good  measurement 
is  better  than  many  noisy  ones”  [39].  It  is  important  to  note  that  although  they  consider 
quantization  noise,  they  do  not  consider  consistency.  These  topics  are  discussed  further  in 
Appendix  B;  Appendix  C  explores  the  use  of  quantized  frame  expansion  as  the  first  stage  in 
a  lattice  quantizer. 

We  conclude  this  chapter  by  noting  that  it  is  complicated  to  get  efficient  signal  rep¬ 
resentations  from  highly  redundant  quantized  frame  expansions.  Because  of  redundancy, 
each  frame  coefficient  does  not  give  the  same  amount  of  information  on  x]  one  way  to  get 
an  efficient  representation  would  be  to  retain  only  the  frame  coefficients  that  give  a  lot  of 
information  on  x.  This  is  essentially  the  theme  of  the  next  chapter. 


Chapter  3 

Adaptive  Expansions 


In  this  chapter,  we  broaden  our  approach  to  finding  linear  expansions  by  allowing  the  basis 
functions  of  the  expansion  to  vary  depending  on  the  signal.  However,  we  are  not  adapting  in 
the  traditional  sense  of  making  fine  adjustments  depending  on  an  error  signal.  Instead,  our 
basic  tool  is  the  matching  pursuit  algorithm  of  [20]  in  which  the  adaptation  is  in  the  choice 
of  basis  functions  from  a  fixed  dictionary  (frame). 

In  §3.1,  we  introduce  the  optimal  approximation  problem  in  order  to  establish  its  com¬ 
putational  intractability.  The  matching  pursuit  algorithm,  described  in  §3.2,  is  a  greedy 
algorithm  for  finding  approximate  solutions  to  the  approximation  problem.  Quantization  of 
coefficients  in  matching  pursuit  leads  to  many  interesting  issues;  some  of  these  are  discussed 
in  §3.3.  Along  with  exploring  general  properties  of  matching  pursuit,  we  are  interested  in  its 
application  to  compressing  data  vectors  in  RN .  A  general  vector  compression  method  based 
on  matching  pursuit  is  described  in  §3.4. 


3.1  The  Optimal  Approximation  Problem 

At  the  end  of  the  previous  chapter,  we  noted  that  the  set  of  coefficients  from  a  highly 
redundant  frame  expansion  are,  without  sophisticated  coding,  an  inefficient  representation 
of  a  signal.  We  expect  to  find  more  efficient  representations  by  forming  a  linear  expansion 
with  respect  to  a  subset  of  the  original  frame.  This  problem  is  formalized  below. 

DEFINITION  [7,  Ch.  2],  Let  a  dictionary  T>  be  a  frame  in  H.  Let  e  >  0  and  L  £  Z+.  For 
/  £  H ,  an  expansion 

L 

f  =  J2a^kt,  (3.1) 

8  =  1 

where  cy  £  C  and  is  called  an  (e,  L)- approximation  if  \\f  —  f\\  <  e.  An  expansion 

(3.1)  that  minimizes  \\f  —  f\\  is  called  an  L-optimal  approximation. 

Since  the  cq-’s  are  not  subject  to  quantization,  these  approximation  problems  do  not 
exactly  correspond  to  finding  rate-distortion  optimal  representations  for  fixed  L.  Also,  this 
formulation  does  not  account  for  the  fact  that,  with  entropy  coding,  the  rate  associated  with 
may  depend  on  the  choice  of  dictionary  elements.  Nevertheless,  we  are  discouraged 
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from  attempting  to  find  optimal  quantized  representations  by  the  following  theorem. 
Theorem  3.1:  Intractability  of  Optimal  Approximation  [7] 

Let  k  >  1  and  let  T>  be  a  dictionary  that  contains  0(Nk)  vectors.  Let  0  <  71  <  72  <  1  and 
let  L  £  TLA  such  that  71 N  <  L  <  72  Ah  For  any  given  e  >  0  and  /  £  H ,  determining  whether 
an  (e,  ^-approximation  exists  is  NP-complete.  Finding  the  L-optimal  approximation  is 
NP-hard. 

Proof:  See  [7,  Ch.  2],  ■ 


3.2  Matching  Pursuit 

The  intractability  of  L-optimal  approximation  stems  from  the  number  of  ways  to  choose  L 
dictionary  elements.  The  complexity  is  reduced  if  the  dictionary  elements  are  chosen  one  at  a 
time  instead  of  L  at  once.  This  reduction  of  a  “global”  problem  to  simpler  “local”  problems 
is  the  defining  characteristic  of  a  greedy  algorithm.  Matching  pursuit  is  a  greedy  algorithm 
for  finding  approximate  solutions  to  the  L-optimal  approximation  problem.  It  progressively 
refines  a  signal  estimate  instead  of  finding  L  components  jointly. 

Matching  pursuit  was  introduced  to  the  signal  processing  community  in  the  context  of 
time-frequency  analysis  by  Mallat  and  Zhang  [20].  Mallat  and  his  students  have  uncovered 
many  of  its  properties  [7,  8,  9,  40]. 

3.2.1  Algorithm 

Let  T>  =  C  H  be  a  frame.  We  impose  the  additional  constraint  that  ||^fc||  =  1 

for  all  k.  We  will  call  D  our  dictionary  of  vectors.  Matching  pursuit  is  an  algorithm  to 
represent  /  £  H  by  a  linear  combination  of  elements  of  T>.  Furthermore,  matching  pursuit 
is  an  iterative  scheme  that  at  each  step  attempts  to  approximate  /  as  closely  as  possible  in  a 
greedy  manner.  We  expect  that  after  a  few  iterations  we  will  have  an  efficient  approximate 
representation  of  /. 

In  the  first  step  of  the  algorithm,  k0  is  selected  such  that  \{tpk0,  f}\  is  maximized.  Then 
/  can  be  written  as  its  projection  onto  ipk0  and  a  residue  Rif, 

f  =  (Pk0,  f)pk0  +  Rif. 

The  algorithm  is  iterated  by  treating  Rif  as  the  vector  to  be  best  approximated  by  a  multiple 
of  (fk!-  At  step  p  +  1,  kp  is  chosen  to  maximize  \{tpkp}  RPf)  |  and 

Rp+if  =  RPf  ~  {Pkp,  RPf)Pkp ■  (3.2) 

Identifying  R0f  =  /,  we  can  write 

n  —  1 

/  =  YsiWi,  Rif)<Pki  +  Rnf ■  (3.3) 

i  =  0 

Hereafter  we  will  denote  (^7,  Rif }  by  cq. 
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3.2.2  Discussion 

Matching  pursuit  is  similar  to  a  class  of  algorithms  used  in  statistics  called  projection  pursuits. 
The  proof  of  the  convergence  of  projection  pursuits  given  in  [18]  can  be  used  to  prove 
the  convergence  of  matching  pursuit  in  infinite  dimensional  spaces.  In  infinite  dimensional 
spaces,  the  convergence  can  be  quite  slow.  However,  the  convergence  is  exponential  in  finite 
dimensional  spaces  [7,  §3.1]. 

Since  oti  is  determined  by  projection,  _L  Ri+\f .  Thus  we  have  the  “energy  conser¬ 

vation”  equation 

||h’/||2  =  ||iWH2  +  «82-  (3.4) 

This  fact,  the  selection  criterion  for  and  the  fact  that  T>  spans  H ,  can  be  combined  for 
a  simple  convergence  proof  for  finite  dimensional  spaces.  In  particular,  the  energy  in  the 
residue  is  strictly  decreasing  until  /  is  exactly  represented. 

In  the  language  of  §3.1,  matching  pursuit  can  be  viewed  as  finding  a  1-optimal  approx¬ 
imation  and  then  iteratively  finding  1-optimal  approximations  on  the  resulting  residues.  If 
T>  is  an  orthonormal  basis,  matching  pursuit  finds  the  optimal  expansion.  For  an  arbitrary 
dictionary,  however,  matching  pursuit  does  not  generally  find  optimal  expansions.  In  fact,  if 
no  two  elements  of  the  dictionary  are  orthogonal,  matching  pursuit  expansions  are  not  only 
not  optimal,  but  they  do  not  converge  in  a  finite  number  of  steps  except  on  a  set  of  measure 
zero  [7,  §3.1]. 

In  the  following,  detailed  operation  counts  and  other  measures  of  complexity  will  not  be 
given  since  the  emphasis  is  not  on  implementation  details.  One  point  to  note  is  that  the  full 
set  of  inner  products  Rpf)}ff=1  need  not  be  computed  at  each  iteration.  By  (3.2), 

(Fv  Rp+l)  ivi-!  Rp)  ( Pkpi  Rp)(<Pi,  Pkp)-  (3.5) 

In  (3.5),  (ipi,  Rp)  and  ,  Rp)  are  known  from  the  previous  iteration,  so  only  (ipi,  (fkp) 
must  be  computed.  Depending  on  the  dictionary  structure,  this  may  involve  a  table  lookup 
or  a  simple  calculation.  Alternatively,  the  dictionary  can  be  structured  so  that  only  a  few 
such  inner  products  are  nonzero. 

Note  that  the  output  of  a  matching  pursuit  expansion  is  not  only  the  coefficients  (a0,  aq, 

.  .  . ),  but  also  the  indices  (k0}  £q,  .  .  . ).  For  storage  and  transmission  purposes,  we  will  have 
to  account  for  the  indices. 

3.2.3  Orthogonalized  Matching  Pursuits 

It  was  noted  that,  even  in  a  finite  dimensional  space,  matching  pursuit  is  not  guaranteed  to 
converge  in  a  finite  number  of  iterations.  This  is  a  serious  drawback  when  exact  (or  very 
precise)  signal  expansions  are  desired,  especially  since  an  optimal  algorithm  would  choose  a 
basis  from  the  dictionary  and  get  an  exact  expansion  in  N  steps.  The  cause  of  this  drawback 
is  that  at  step  p -\-  1,  tpkp  is  not  necessarily  chosen  orthogonal  to  Span({(^ }NT0 ). 

The  matching  pursuit  algorithm  can  be  modified  to  insure  that  at  each  iteration  the 
contribution  to  the  linear  expansion  is  orthogonal  to  all  previous  terms.  Convergence  in  N 
steps  is  then  guaranteed.  A  simple  method  of  accelerating  convergence  through  orthogonal- 
ization  is  described  below  [28].  The  selection  of  dictionary  elements  is  the  same  as  before. 
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After  a  dictionary  element  (pkp  is  chosen,  it  is  orthogonalized  with  respect  to  q1  before 

the  residue  Rpf  is  calculated.  (Because  of  the  orthogonalization,  no  dictionary  element  is 
chosen  twice.)  This  insures  that  Rp+\f  is  orthogonal  to  Lpkt  for  i  =  0,1,..., p.  A  better 
orthogonalization  method  is  presented  by  Kalker  and  Vetterli  in  [19]. 

It  has  been  noted  by  several  authors  [7,  19,  36]  that  for  a  small  number  of  iterations,  or¬ 
thogonal  matching  pursuit  does  not  converge  significantly  faster  than  the  non-orthogonalized 
version.  For  this  reason,  orthogonal  matching  pursuit  is  not  considered  hereafter. 

3.2.4  Relationship  to  the  Karhunen-Loeve  Transformation 

In  this  section,  we  forge  an  analogy  between  matching  pursuit  and  the  Karhunen-Loeve 
transform  (KLT).  Our  aim  is  to  show  that  matching  pursuit  has  some  of  the  properties 
that  make  the  KLT  useful  in  transform  coding.  We  assert  that  matching  pursuit  acts  as  a 
universal  transform  for  transform  coding. 

For  a  stationary,  vector-valued  random  process  X,  the  Karhunen-Loeve  transform  is  the 
unique  orthogonal  transform  U  such  that  Y  =  GX  has  a  diagonal  covariance  matrix  with  the 
eigenvalues  appearing  in  descending  order  on  the  diagonal  [21,  §1.2.4],  Note  that  determining 
the  KLT  requires  knowledge  of  the  distribution  of  X.  Approximating  the  KLT  from  data  is 
essentially  the  same  as  principal  component  analysis  [17]. 

It  is  well  known  that  the  KLT  is  the  optimal  transform  for  transform  coding.  Since  the 
limitations  to  this  result  are  not  as  well  known,  we  state  the  following  theorem  paraphrased 
from  [13]: 

Theorem  3.2:  Optimality  of  the  Karhunen-Loeve  Transform 
Consider  the  transform  coding  of  a  jointly  Gaussian  random  process.  Suppose  the  quanti¬ 
zation  is  fine  enough  to  use  high  resolution  approximations,  and  that  arbitrary  real  (non¬ 
integer)  values  can  be  allocated  to  the  resolution  of  each  (scalar)  quantizer.  Then  the  KLT 
achieves  the  lowest  overall  distortion  of  any  orthogonal  transform. 

Proof:  See  [13,  §8.6].  ■ 

Two  properties  of  the  KLT  that  make  it  good  for  transform  coding  are  qualitatively 
mimicked  by  matching  pursuit: 

1.  Energy  compaction:  For  1  <  i  <  N  —  1,  the  energy  in  {yi,  y2,  •••,  yi}  is 
maximum  over  all  orthogonal  transforms.  For  this  reason  the  KLT  is  said  to  give 
optimal  energy  compaction. 

2.  Principal  axes:  If  X  has  an  ellipsoidal  distribution  (as  when  X  is  Gaussian),  the 
it h  transformed  variable  y;  corresponds  to  the  it h  principal  axis  of  the  ellipsoid. 

This  is  closely  coupled  with  energy  compaction,  since  the  it h  principal  axis  is  the 
direction  in  which  there  is  the  it h  largest  energy. 

We  first  explore  the  energy  compaction  properties  of  matching  pursuit.  The  criterion 
for  the  choice  of  V  makes  some  degree  of  energy  compaction  obvious.  Since  we  only  solve 
1-optimal  approximation  problems,  matching  pursuit  does  not  always  give  optimal  energy 
compaction  when  more  than  one  iteration  is  performed.  However,  in  matching  pursuit  we  are 
optimizing  on  a  sample-by-sample  basis,  as  opposed  to  looking  at  average  performance  with 
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Figure  3.1:  Energy  compaction  achieved  using  matching  pursuit  on  an  M2- valued  source. 


a  fixed  transform.  Therefore  matching  pursuit  generally  gives  much  more  energy  compaction 
than  the  KLT.  In  particular,  matching  pursuit  will  give  energy  compaction  even  if  X  has  a 
diagonal  covariance  matrix,  in  which  case  the  KLT  gives  no  energy  compaction. 

Energy  compaction  performance  was  assessed  by  simulation.  In  ffi2,  two  sources  were 
used: 


•  An  uncorrelated  zero-mean  Gaussian  source  X  r^j  •v'(o ,/). 

•  A  Gaussian  source 


X  ~  AT  |0,  Aj 

where  A$  is  a  Givens  plane  rotation  matrix  (6  =  -|) 


1  0 
0  0.2 


Aa 


(3.6) 


Dictionaries  of  the  form 


V 


27 rk 

.  27 rk 

T\ 

cos  ~irr 
M 

sm  ~irr 
M 

1 

(3.7) 


were  used.  The  results  are  shown  in  Figure  3.1.  Using  matching  pursuit,  more  than  93% 
of  the  energy  is  captured  in  the  first  coefficient,  and  the  energy  compaction  increases,  with 
increasing  dictionary  redundancy.  The  KLT  would  give  |  and  |  of  the  energy  in  the  first 
coefficient  for  l  lie  uncorrelated  and  correlated  sources,  respectively. 

Simulations  were  also  performed  for  P_4-valued  sources.  Two  sources  were  used: 
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Figure  3.2:  Energy  compaction  achieved  using  matching  pursuit  on  an  P_4-valued  source. 

•  An  uncorrelated  zero-mean  Gaussian  source  X  r^j  •v'(0 ,/). 

•  A  correlated  zero-mean  Gaussian  source  formed  by  placing  a  first-order  autoregressive 
source  with  correlation  0.9  in  blocks  of  length  4. 

Dictionaries  were  generated  from  maximally  spaced  points  on  the  unit  sphere  [16].  The 
results  are  given  in  Figure  3.2.  As  expected,  the  energy  compaction  in  the  first  component 
increases  with  r ,  ranging  from  about  0.55  to  0.96.  In  this  case,  the  KLT  would  give  |  and 
~  0.8817  of  the  energy  in  the  first  coefficient  for  the  uncorrelated  and  correlated  sources, 
respectively.  So  in  this  experiment,  matching  pursuit  always  gives  better  energy  compaction 
for  the  uncorrelated  source,  and  also  does  so  for  the  highly  correlated  source  when  r  >  8. 

When  X  has  an  ellipsoidal  distribution,  geometric  intuition  suggests  that  ipk0  will  more 
likely  be  close  to  the  principal  axis  than  far  from  it.  Similarly,  given  that  ipk0  is  nearly 
parallel  to  the  principal  axis,  the  distribution  of  R0x  will  be  ellipsoidal  with  principal  axis 
equal  to  the  second  principal  axis  of  the  distribution  of  x.  (Since  the  most  we  can  possibly 
say  is  that  Lp^0  usually  is  nearly  parallel  to  the  principal  axis,  this  reasoning  is  somewhat 
weak.)  We  would  like  to  formalize  this  intuition.  In  particular,  we  attempt  to  demonstrate* 
that  the  indices  kt  can  be  used  to  estimate  the  principal  axes  and  that  the  algorithm  is  likely 
to  choose  indices  that  correspond  to  the  KLT.  We  are  however  not  asserting  that  it  would 
be  ideal  for  the  algorithm  to  choose  indices  corresponding  to  the  KLT;  matching  pursuit  is 
acting  locally ,  while  the  KLT  is  based  on  global  stationary  statistics. 

Methods  for  estimating  the  principal  axes  are  not  immediately  obvious.  We  cannot  simply 
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“average  the  s  to  estimate  the  first  principal  axis”  because,  with  a  sufficiently  regular 
and  dense  dictionary  and  an  ellipsoidal  distribution,  Ep>k0  =  0. 

For  example,  consider  quantization  of  the  M2- valued  source  from  (3.6),  expanded  using 
a  dictionary  as  in  (3.7)  with  M  =  199.  Figure  3.3  shows  histograms  of  k0  and  for  10000 
samples.  The  peaks  of  the  histograms  are  at  k0  =  33  and  =  83.  These  correspond  to 
angles  (modulo  7r)  of  |||  and  q|jy,  respectively.  These  are  very  close  to  angles  of  the  principal 
axes  of  the  distribution,  which  are  ^  and  ^4.  Unfortunately,  looking  at  peaks  of  histograms 
is  not  very  robust  and  is  limited  by  the  redundancy  and  regularity  of  the  dictionary.  Thus  we 
would  like  to  use  a  method  that  involves  averaging.  As  we  noted,  averaging  t^’s  and  t^’s 
is  meaningless.  Referring  to  Figure  3.3,  this  is  because  of  the  bimodality  of  the  histograms. 
It  also  makes  no  sense  to  average  the  index  numbers  because  this  would  not  be  invariant  to 
renumberings  of  the  dictionary,  even  those  renumberings  that  maintain  the  natural  order.1 
Using  a  dictionary  that  is  spread  along  half  of  the  unit  circle  instead  of  the  whole  unit  circle 
would  bias  the  estimates  toward  the  center  of  the  half-circle  chosen.  The  proposed  solution 
is  to  use  the  histogram  peaks  as  initial  estimates  of  the  principal  axes  and  then  “center”  the 
dictionary  around  the  corresponding  vectors.  For  concreteness,  suppose  we  are  estimating 
the  first  principal  axis.  Suppose  we  have  used  matching  pursuit  to  expand  a  set  of  samples. 
Let  k0  denote  the  histogram  peak  of  the  k0}s.  For  the  rath  sample,  we  increment  the  principal 
axis  estimate  by  <p(k0)m  if  (uUoffi’  Tk0)  —  anc^  ^  ~(f(k0)m  otherwise.  This  procedure  can  be 
applied  in  any  dimension  because  it  does  not  depend  on  an  ordering  of  dictionary  elements. 
A  potential  pitfall  is  that  if  the  dictionary  is  not  uniform,  the  histogram  peak  may  be  a  poor 
initial  estimate. 

Figure  3.4  shows  simulation  results  for  principal  axis  estimation  using  the  methods  dis¬ 
cussed  above.  The  source  is  as  given  by  (3.6)  and  the  dictionary  is  as  in  (3.7)  with  M  =  399. 
The  error  is  measured  as  an  angular  error  in  radians.  The  figure  shows  that  both  methods 
(looking  only  at  histogram  peaks  and  averaging  using  a  peak  as  an  initial  estimate)  give 
increasingly  good  estimates  as  data  accumulates.  The  averaging  method  gives  MSEs  that 
are  lower  by  about  a  factor  of  ten. 

Simulations  were  also  conducted  with  the  same  M4- valued  autoregressive  source  as  before. 
A  dictionary  of  130  maximally  spaced  unit  vectors  from  [16]  was  used.  The  results  are  shown 
in  Figure  3.5.  The  three  pairs  of  curves  correspond  to  estimating  the  first  three  principal  axes 
of  the  distribution.  The  solid  and  dashed  curves  correspond  to  the  averaging  and  histogram 
peak  methods,  respectively.  In  this  case  the  error  is  measured  as  Euclidean  distance  between 
a  unit  vector  in  the  true  axis  direction  and  the  estimated  axis  direction.  The  results  show 
that  while  the  first  axis  can  be  well-estimated,  it  is  much  harder  to  estimate  the  subsequent 
principal  axes.  The  principal  axes  are  probably  easier  to  estimate  when  the  eigenvalue  spread 
of  the  covariance  of  the  source  is  large,  but  this  is  not  explored  further  in  this  discussion. 

Before  moving  on  to  study  the  effects  of  coefficient  quantization,  we  would  like  to  explore 
the  dependence  of  index  entropy  on  r.  We  have  seen  that  increasing  dictionary  redundancy 
increases  energy  compaction.  The  price  to  pay  is  that  the  entropy  of  the  indices  goes  up. 
We  explore  this  tradeoff  through  an  example.  This  time  we  consider  a  non-ellipsoidal  source 
generated  by  equally  mixing  sources  of  the  form  (3.6)  with  9  equal  to  ^  and  —  Figure  3.6 
shows  1000  samples  from  this  source.  Note  that  the  KLT  for  this  source  is  simply  the  identity 

Tn  higher  dimensions,  there  would  generally  be  no  natural  ordering  to  dictionary  elements. 
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Histogram  of  first  index 


Figure  3.3:  Histograms  of  indices  chosen  by  matching  pursuit 


transformation.  The  samples  were  expanded  using  dictionaries  of  the  form 


V 


( 

7 rk 

.  7 rk 

T\ 

/ 

v 

cos  — 

sm  — 

1 

M 

M. 

1 

M — 1 


k=0 


(3.8) 


Figure  3.7  shows  the  resulting  energy  compactions  and  index  entropies  as  functions  of  r. 
(The  index  entropy  should  rightly  be  called  a  sample  entropy.  One  must  be  very  careful 
to  use  large  sample  sizes  to  get  relevant  sample  entropies.)  We  see  that  the  entropy  of  the 
first  index  is  proportional  to  log  r,  but  the  energy  compaction  levels  off  rather  quickly.  So  as 
log  r  is  increased,  there  are  diminishing  returns  in  energy  compaction,  but  the  cost  increases 
linearly. 


3.3  Quantized  Matching  Pursuit 

Although  matching  pursuit  has  been  applied  to  low  bit  rate  compression  problems  [19,  23, 
24,  25,  36],  which  inherently  require  coarse  coefficient  quantization,  little  work  has  been 
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Figure  3.6:  One  thousand  samples  from  a  nomqllipsoidal  sourc# 


Figure  3.7:  Energy  compaction  and  index  entropy  as  functions  of  redundancy  r  for  a  non- 
ellipsoidal  source. 


done  to  understand  the  qualitative  effects  of  coefficient  quantization  in  matching  pursuit.  In 
this  section  we  explore  some  of  these  effects.  In  §3.3.2,  application  of  matching  pursuit  to 
compress  an  M2- valued  source  is  considered  in  great  detail.  The  highlight  of  the  subsection 
is  the  generation  of  intricate  partition  diagrams.  These  partition  diagrams  demonstrate 
that  matching  pursuit  expansions  can  be  inconsistent.  The  issue  of  consistency  in  these: 
expansions  is  explored  in  §3.3.3.  The  relationship  between  quantized  matching  pursuit  and 
other  vector  quantization  methods  is  discussed  in  §3.3.4. 

3.3.1  Discussion 

Coefficients  are  quantized  in  any  computer  implementation  of  matching  pursuit.  When  the; 
quantization  is  fine,  it  is  generally  safe  to  ignore  this  fact.  For  example,  in  all  the  simulations 
of  §3.2,  coefficient  quantization  does  not  make  any  qualitative  differences.  If  the  quantization 
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is  coarse,  as  it  must  be  for  moderate  to  low  bit  rate  compression  applications,  the  effects  of 
quantization  may  be  significant. 

Define  quantized  matching  pursuit  to  be  matching  pursuit  with  non-negligible  quantiza¬ 
tion  of  the  coefficients.  We  will  denote  the  quantized  coefficients  by  cq  =  Q[cq].  Note  that 
quantization  destroys  the  orthogonality  of  the  projection  and  residual,  so  the  analog  of  (3.4) 
does  not  hold,  i.e. 

\\Rtf\\2^\\Rt+1f\\2  +  N f. 

Also,  (3.5)  does  not  hold. 

We  are  assuming  that  the  quantization  of  cq  occurs  before  the  residual  Ri+if  is  calculated, 
and  that  the  quantized  version  is  used  in  determining  the  residual  so  that  quantization  errors 
do  not  propagate  to  subsequent  iterations.  Since  cq  must  be  determined  before  cq+i,  it  is 
implicit  in  this  assumption  is  that  the  coefficient  quantization  is  scalar. 

For  any  particular  application,  there  are  several  design  problems:  a  dictionary  must  be 
chosen,  scalar  quantizers  must  be  designed,  and  the  number  of  iterations  (or  a  stopping 
criterion)  must  be  set.  In  principle,  these  could  be  jointly  optimized  for  a  given  source 
distribution,  distortion  measure,  and  rate  measure.  In  practice,  this  is  an  overly  broad 
problem.  In  the  following  subsection,  we  will  make  several  choices,  some  of  them  arbitrary. 


3.3.2  A  Detailed  Example 

Consider  quantization  of  a  source  with  a  uniform  distribution  on  [—1,  l]2.  Assume  distortion 
is  measured  by  squared  Euclidean  distance  and  rate  is  measured  by  codebook  size.  (Measur¬ 
ing  rate  by  codebook  size  is  natural  when  a  fixed  rate  coder  will  be  applied  to  the  quantizer 
output,  i.e.  no  entropy  coding  is  used.)  Also  assume  that  two  iterations  will  be  performed 
with  a  four  element  dictionary.  Other  constraints  will  be  set  as  needed. 

We  first  choose  a  dictionary.  Guided  by  symmetry,  we  choose 


V 


(2k  —  1 )  7T 

cos - 

8 


sm  ■ 


(2k  —  1 )  7T 


l  T ' 


8 


k= 1 


A  first  impulse  may  be  to  use 


V 


cos 


(k  —  1 )  7T  .  (k  —  1 )  7T 


l  T ' 


sm  ■ 


k= 1 


(3.9) 


(3.10) 


In  a  detailed  analysis,  (3.9)  was  determined  to  lead  to  a  better  design.  Also,  (3.10)  is  not 
symmetric  with  respect  to  the  region  of  support  of  the  distribution. 

To  begin  with,  assume  that  the  quantization  of  coefficients  will  be  fine.  Then,  since  the 
dictionary  is  composed  of  pairs  of  orthogonal  vectors,  ipk0  -L  Ffci-  Thus  once  we  have  coded 
k0}  ki  is  determined  for  free.  (As  long  as  we  are  using  a  fine  quantization  assumption,  we 
will  actually  force  the  to  be  selected  such  that  ipk0  T  t^.)  It  is  easy  to  see  that  k0  will 
be  uniformly  distributed  on  {1,  2,  3,  4};  thus,  with  or  without  entropy  coding,  k0  requires  2 
bits. 
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We  now  design  the  quantizers.  The  p.d.f.  of  ay  can  be  explicitly  calculated  as 


Pa0{y) 


2(v/2  —  l)|y| _  \y\  <  ^2  +  ^2 

<  -2(|y|-^l  +  V2)  +  <  |y|  < 

0  otherwise 


(3.11) 


If  the  dictionary  was  not  symmetric,  (3.11)  would  have  to  be  conditioned  on  Ay.  Since  we 
are  assuming  fine  quantization,  the  best  codebook  constrained  quantizer  for  ay  can  be  found 
analytically  using  a  compandor  model  [13].  The  optimal  quantizer  is 

a0  =  G  1  o  Qu  o  G(a0)} 


where 


G(y) 


(2+%i/6  sgn(^)  y4/3 

sgn(y) 


1  +  72  “ 


21/3 


(2-y^)1/6 


ysgn(y)  -  ^  1  + 


4/3' 


|y|<^±^ 

otherwise 


and  Qu  is  a  uniform  quantizer. 

Given  ay,  the  distribution  of  a i  is  uniform  on  [—  |ay|,  |ay|],  Since  the  quantization  of  ay 
is  hne,  the  distribution  given  ay  is  approximately  the  same.  Thus  the  optimal  quantizer  for 
aq  is  uniform. 

We  have  yet  to  decide  how  to  divide  our  bit  rate  between  ay  and  aq.  Since  ipk0  T 
the  total  distortion  is  simply  the  sum  of  the  distortions  created  by  each  quantization.  We 
can  thus  minimize  distortion  for  a  fixed  rate  by  Lagrangian  methods. 

If  we  impose  the  constraint  that  the  rate  for  aq  must  be  constant,  we  get  a  codebook  as 
in  Figure  3.8(a).  On  the  other  hand,  if  we  allow  the  rate  for  oq  to  be  conditioned  on  oy, 
we  get  a  codebook  as  in  Figure  3.8(b).  (Actually,  these  codebooks  are  for  the  dictionary 
(3.10),  but  the  observations  and  conclusions  are  still  clear.)  The  two  codebooks  have  906 
and  900  elements,  respectively,  so  they  give  approximately  equal  rates.  The  codebook  in 
Figure  3.8(b)  gives  lower  distortion,  as  is  clear  from  the  more  uniform  distribution  of  code 
vectors. 

When  the  rate  for  oq  depends  on  oy,  the  Lagrangian  optimization  implies  that  the  number 
of  quantization  levels  for  aq  should  be  proportional  to  .  Using  a  codebook  size  of  304 
and  choosing  the  proportionality  constant  appropriately  yields  the  codebook  and  partition 
shown  in  Figure  3.9.  This  codebook  gives  approximately  0.1561  bits  worse  performance  than 
simple  uniform  scalar  quantization.  (Recall  that  this  includes  two  bits  for  Ay.)  Of  course, 
this  should  not  be  too  discouraging  because  the  region  of  support  and  distribution  of  the 
source  in  this  simulation  are  tailor-made  for  uniform  scalar  quantization.  As  we  will  see  in 
§3.4,  matching  pursuit  tends  to  be  effective  when  the  number  of  iterations  is  less  than  the 
dimension  of  the  space. 

Figure  3.9  should  be  seen  as  a  first  approximation  to  the  type  of  partition  created  by 
matching  pursuit  because  we  forced  ipk0  -L  Ffci-  (This  was  part  of  the  hne  quantization 
assumption.)  Let  us  now  remove  the  hne  quantization  assumption  and  allow  the  source  to 
have  an  arbitrary  distribution  on  M2.  Even  with  a  known  distribution,  it  is  difficult  to  hnd 
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analytical  expressions  for  optimal  quantizers  without  using  a  fine  quantization  assumption. 
Since  we  wish  to  use  fixed,  untrained  quantizers,  we  will  use  uniform  quantizers  for  a0  and 
aq.  Since  it  will  still  generally  be  true  that  ipk0  -L  Pkn  it  makes  sense  for  the  quantization 
stepsizes  for  a0  and  aq  to  be  equal. 

The  partitions  generated  by  matching  pursuit  are  very  intricate.  Figure  3.10  shows 
the  partitioning  of  the  first  quadrant  when  zero  is  a  quantizer  boundary  value,  i.e.  the 
quantizer  boundary  points  are  {mA}mez  and  reconstruction  points  are  {(to  +  |)A}mez  for 
some  quantization  stepsize  A.  The  yellow  lines  denote  the  partitions  induced  by  selection 
of  k0.  Then  a0  is  quantized,  giving  the  cyan  boundaries.  Recall  that  the  residue  Rif  is  not 
necessarily  orthogonal  to  <fk0.  Thus  the  selection  of  £q  introduces  the  magenta  boundaries. 
Finally,  the  red  boundaries  come  from  quantizing  aq.  In  Figure  3.10,  most  of  the  cells  are 
squares,  but  there  are  also  some  smaller  cells.  Unless  the  source  distribution  happens  to  have 
high  density  in  the  smaller  cells,  the  smaller  cells  are  inefficient  in  a  rate-distortion  sense. 
The  fraction  of  cells  that  are  not  square  — >■  0  as  A  — >■  0. 

The  partition  is  qualitatively  different  when  the  quantizer  boundary  points  are 
{(to  +  |)A}mez  and  reconstruction  points  are  The  partition  is  shown  in  Fig¬ 

ure  3.11.  The  colors  are  the  same  as  in  Figure  3.10.  The  dotted  magenta  lines  show  bound¬ 
aries  that  are  created  by  choice  of  £q  but  are  not  important  because  aq  =  0.  (Similarly  for 
the  dotted  yellow  line.)  This  partition  also  has  mostly  square  cells.  Compared  to  Figure  3.10, 
there  are  fewer  of  the  “bad”  small  cells.  As  before,  the  fraction  of  non-square  cells  vanishes 
as  A  — >  0. 

The  qualitative  difference  between  Figure  3.9  and  Figures  3.10-3.11  is  due  to  the  fact  that 
the  latter  result  from  more  constraints.  The  partition  of  Figure  3.9  arises  from  specifying  k0} 
cto  and  oq,  with  £q  =  k0  +  2  (mod  4).  The  partitions  of  Figures  3.10-3.11  show  the  result 
of  adding  an  additional  degree  of  freedom  in  £q . 

These  examples  illustrate  that  there  are  many  design  parameters  within  the  matching 
pursuit  framework.  Optimizing  these  parameters  requires  a  measure  of  optimality  and  knowl¬ 
edge  of  the  source  p.d.f.  Figures  3.9-3.11  show  that  the  partitions  generated  by  matching 
pursuit  look  quite  different  than  those  generated  by  a  quantized  frame  expansion  (see  Fig¬ 
ure  B.l),  of  which  independent  scalar  quantization  is  a  special  case. 

3.3.3  Consistency  in  Quantized  Matching  Pursuit 

When  consistency  was  previously  considered  in  §2.2.3,  the  problem  arose  from  having  a 
representation  in  CM  and  attempting  to  estimate  a  reconstruction  in  CN .  There  is  a  possi¬ 
bility  of  inconsistency  in  any  framework  with  non-orthogonal  linear  constraints.  We  will  see 
that  a  matching  pursuit  representation  implicitly  contains  many  linear  constraints  and  that 
inconsistency  is  not  uncommon. 

Suppose  p  iterations  of  matching  pursuit  are  performed  with  the  dictionary  T>.  The 
output  of  the  (quantized)  matching  pursuit  algorithm  is 
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Figure  3.10:  Partitioning  of  ffi2  by  matching  pursuit  with  four  element  dictionary.  Zero  is  a 
quantizer  boundary  value. 
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Figure  3.11:  Partitioning  of  ffi2  by  matching  pursuit  with  four  element  dictionary.  Zero  is  a 
quantizer  reconstruction  value. 
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(There  is  nothing  consistent  or  inconsistent  about  this  set.)  The  standard  reconstruction  is 

p- 1 

/  =  (3.13) 

i  =  0 


Denote  the  output  of  matching  pursuit  (with  the  same  dictionary  and  quantizers)  applied 
to  /  by 

-{/kq,  (Xq  ,  k1,  oq  ,  .  .  .  ,  (Xp—i 


If 

ki  =  k[  and  cq  =  o'/  (3-14) 

for  i  =  0,  1,  .  .  . ,  p  —  1,  we  say  that  /  is  a  strictly  consistent  estimate.  If  (3.14)  holds 
except  possibly  that  ki  ^  k[  for  some  i  for  which  cq  =  cq7  =  0,  we  say  that  /  is  a  loosely 
consistent  estimate.  The  second  definition  is  included  because  a  reasonable  coding  scheme 
might  discard  ki  if  ay  =  0. 

The  crucial  point  is  that  there  is  more  information  in  (3.12),  along  with  T>  and  knowledge 
of  the  working  of  matching  pursuit,  than  there  is  in  /.  In  particular,  (3.12)  gives  a  set  of 
linear  inequality  constraints  that  defines  a  partition  cell  in  which  /  lies.  /  is  an  estimate  of 
/  that  does  not  necessarily  lie  in  this  cell. 

Let  us  now  list  the  complete  set  of  constraints  implied  by  (3.12).  For  notational  con¬ 
venience,  we  assume  uniform  scalar  quantization  of  the  coefficients  with  stepsize  A  and 
midpoint  reconstruction.  The  selection  of  k0  implies 


l(v*o.  /)l  >  Kv’.  /)l .  e  v. 


(3.15) 


For  each  element  of  T>  \  (3.15)  specifies  a  half-space  constraint  with  boundary  plane 

passing  through  the  origin.  The  intersection  of  these  constraints  is  thus  two  infinite  pyramids 
situated  symmetrically  with  their  apexes  at  the  origin.  The  value  of  oq  gives  the  constraint 


(FA,  f)  £ 


a0 


A 


A' 


V'Qo  +  j. 


This  specifies  a  pair  of  planes,  perpendicular  to  ipk0,  between  which  /  must  lie.  At  the 
(i  —  1 ) st  step,  the  selection  of  ki  gives  the  constraints 


Tkt 


> 


I  i-i 

F,  /  -Y^WVki 


1=0 


(3.16) 


This  defines  M  —  1  linear  half-space  constraints  with  boundaries  passing  through  )Cr=o  ctePkr 
As  before,  these  define  two  infinite  pyramids  situated  symmetrically  with  their  apexes  at 
Then  a}  gives 


/  8-1  \ 

FA,  /  -^2^iTke  )  G 
\  i=o  / 


_  A  _  A' 
ai  ~  y,  ai  +  y 


(3.17) 


This  again  specifies  a  pair  of  planes,  this  time  perpendicular  to  between  which  /  must 
lie. 
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O 


(a)  (b) 

Figure  3.12:  (a)  Portion  of  partition  of  Figure  3.10  with  reconstruction  points  marked,  (b) 
Portion  of  partition  of  Figure  3.11  with  reconstruction  points  marked. 


By  being  explicit  about  the  constraints,  we  see  that  all  of  the  constraints  are  linear,  so  the 
partition  cell  defined  by  (3.12)  is  convex.  Thus  by  using  an  appropriate  projection  operator, 
one  can  find  a  strictly  consistent  estimate  from  any  initial  estimate.  In  practice,  hireling  such 
a  projection  operator  may  be  difficult. 

The  quantization  of  ffi2  considered  in  §3.3.2  gives  concrete  examples  of  inconsistency. 
Recall  the  partitions  of  Figures  3.10  and  3.11.  The  reconstruction  points  were  not  marked  on 
thesis  diagrams  because  the  correspondence  between  cells  and  reconstruction  points  would 
not  have  been  clear.  Figures  3.12(a)  and  3.12(b)  depicts  parts  of  these  partitions  with 
reconstruction  points  marked  with  circles.  These  show  that  matching  pursuit  reconstructions 
are  not  always  consistent.  Figures  3.13(a)  and  3.13(b)  are  copies  of  Figures  3.10  and  3.11 
with  cells  that  lead  to  inconsistent  reconstructions  marked  with  x’s. 

Experiments  were  performed  to  assess  how  the  probability  of  an  inconsistent  estimate 
depends  on  D,  r,  and  A.  The  loose  sense  of  consistency  was  used  in  all  the  experiments. 

The  first  set  of  experiments  involved  quantizing  an  M2- valued  source  with  the  Af{0,I) 
distribution.  With  T>  as  in  (3.8),  M  was  varied  between  2  and  256  while  A  was  varied 
between  10-1'9  and  10°'3.  Figure  3.14  shows  the  probability  of  inconsistency  as  a  function  of 
M  and  A.  The  probability  of  inconsistency  is  significant!  The  surface  is  rather  complicated, 
but  we  can  identify  two  trends:  the  probability  of  inconsistency  goes  to  zero  as  M  is  increased 
and  as  A  — >  0.  This  can  be  more  clearly  seen  from  two  “slices”  of  a  similar  surface  obtained 
with  T>  as  in  (3.7).  The  slices  are  shown  in  Figure  3.15. 

To  explore  the  dependence  on  D,  experiments  were  performed  for  quantizing  an  M5- 
valued  source  with  the  W(0,/)  distribution.  The  consistency  of  reconstruction  was  checked 


(a)  (b) 

Figure  3.13:  (a)  Partition  of  Figure  3.10  with  regions  leading  to  inconsistent  reconstructions 
marked,  (b)  Partition  of  Figure  3.11  with  regions  leading  to  inconsistent  reconstructions 
marked. 


for  two  iteration  expansions.  Dictionary  sizes  of  M  =  25,  50,  75,  100,  and  125  were  used. 
The  results  are  shown  in  Figures  3.16  and  3.17.  In  Figure  3.16,  the  dictionaries  used 
are  those  corresponding  to  oversampled  A/D  conversion  as  given  in  (2.9).  Figure  3.17  was 
generated  using  dictionaries  of  maximally  spaced  points  [16].  For  both  types  of  dictionaries, 
the  probability  of  inconsistency  goes  to  one  for  very  coarse  quantization  and  goes  to  zero  as 
A  — >  0.  The  qualitative  difference  between  the  curves  indicates  that  there  are  complicated 
geometric  factors  involved  that  are  at  this  time  beyond  our  understanding. 

3.3.4  Relationship  to  Vector  Quantization 

Given  a  vector  in  WN,  quantized  matching  pursuit  produces  an  estimate  from  a  countable  set. 
(If  the  quantizers  have  bounded  ranges,  the  estimate  is  from  a  finite  set.)  Hence  cpiantized 
matching  pursuit  can  be  described  as  a  vector  cpiantization  (VQ)  method;  we  would  like  to 
understand  its  place  among  the  many  existing  VQ  methods. 

A  single  iteration  of  matching  pursuit  is  very  similar  to  shape-gain  VQ,  which  was  intro¬ 
duced  in  [2],  In  shape-gain  VQ,  a  vector  x  £  WN  is  separated  into  a  gain,  g  =  ||,r||  and  a 
.: shape ,  s  =  x/g.  A  shape  s  is  chosen  from  a  shape  codebook  Cs  to  maximize  (x,  s).  Then  a 
gain  g  is  chosen  from  a  gain  codebook  Cg  to  minimize  (g  —  (x,  s))2.  The  similarity  is  clear 
with  Cs  corresponding  to  T>  and  Cg  corresponding  to  the  quantizer  for  q0.  Obtaining  a  good 
approximation  in  shape-gain  VQ  requires  that  Cs  forms  a  very  dense  subset  of  ,5,JV_1,  the 
surface  of  the  unit  sphere  in  WN.  The  area  of  ,5,JV_1  increases  exponentially  with  V,  making 
it  difficult  to  use  shape-gain  VQ  in  high  dimensional  spaces.  A  multi-iteration  application 
of  matching  pursuit  can  be  seen  as  a  cascade  form  of  shape-gain  VQ. 
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Figure  3.16:  Probabilities  of  inconsistent  reconstruction  for  an  W5- valued  source.  Dictionaries 
correspond  to  over  sampled  A/D  conversion. 


Figure  3.17:  Probabilities  of  inconsistent  reconstruction  for  an  :  5  valued  source.  Dictionaries 
composed  of  maximally  space  points  on  the  unit  sphere. 
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Although  our  discussion  has  been  in  the  language  of  linear  expansions,  matching  pursuit 
can  be  seen  to  give  partition  cells  and  reconstruction  points.  For  an  optimal  VQ  codebook, 
the  centroid  condition  must  hold:  the  reconstruction  value  for  a  partition  cell  must  be 
the  centroid  of  the  cell  with  respect  to  the  probability  density  of  the  source.  Even  if  we 
make  a  simplifying  assumption  such  as  a  uniform  distribution  of  the  source,  the  codebook 
given  by  matching  pursuit  (assuming  reconstruction  according  to  (3.13))  does  not  satisfy 
the  centroid  condition.  This  is  shown  in  Figures  3.9  and  3.12,  where  inconsistency  is  an 
extreme  case  of  non-centroid  reconstruction.  Viewed  in  this  way  alone,  matching  pursuit  is 
a  bad  vector  quantization  method.  However,  recall  that  if  optimal  trained  VQ  is  used,  the 
centroid  values  (reconstruction  points)  must  all  be  stored.  By  basing  the  codebook  on  linear 
expansions,  we  are  considerably  lowering  the  storage  requirements.  Referring  to  Figures  3.9 
and  3.12,  centroids  could  be  calculated  with  respect  to  a  uniform  distribution  and  used 
as  reconstruction  points,  replacing  (3.13).  The  structure  of  the  partition  would  allow  the 
reconstruction  points  to  be  stored  efficiently. 

3.4  A  General  Vector  Compression  Algorithm  Based 
on  Frames 

This  section  explores  the  efficacy  of  using  matching  pursuit  as  an  algorithm  for  lossy  com¬ 
pression  for  vectors  in  RN.  Most  lossy  compression  can  be  viewed  as  compressing  vectors  in 
Rw,  although  the  source  distribution  will  depend  on  the  application.  The  application  may 
also  give  coding  constraints  (such  as  requiring  a  fix  bit  rate)  and  complexity  constraints, 
and  may  suggest  a  relevant  distortion  metric.  Here  we  will  measure  rate  by  entropy,  thus 
implicitly  allowing  variable  bit  rates,  and  measure  distortion  by  MSE.  Experimental  results 
will  be  given  for  autoregressive  sources,  but  distributional  knowledge  will  not  be  used  in  the 
design. 

3.4.1  Design  Considerations 

With  no  distributional  assumptions,  we  expect  the  best  performance  with  a  dictionary  that 
is  “evenly  spaced”  on  the  unit  sphere  or  a  hemisphere.  We  are  purposely  vague  about  the 
meaning  of  evenly  spaced,  since  the  importance  of  this  is  not  clear.  For  simplicity,  the  inner 
product  quantization  is  uniform.  It  is  unlikely  that  any  other  fixed  quantization  would  do 
better  over  a  large  class  of  source  distributions.  Furthermore,  the  quantization  stepsize  A  is 
constant  across  iterations.  This  is  consistent  with  equal  weighting  of  error  in  each  direction. 

In  our  earlier  examples,  three  methods  for  generating  dictionaries  have  been  used.  In 
M2,  dictionaries  were  formed  from  roots  of  unity  as  in  (3.7)  and  (3.8).  In  higher  dimensions, 
dictionaries  were  formed  from  sets  of  maximally  spaced  points  on  the  unit  sphere  [16]  or 
from  a  Fourier  transform-like  set  as  in  (2.9).  We  introduce  one  more  method  for  generating 
dictionaries.  The  corners  of  the  hypercube  [— ^=,  form  a  set  of  2N  symmetric  points 

on  the  surface  on  the  unit  sphere  in  RN.  Taking  the  subset  of  points  that  have  a  positive 
first  coordinate  gives  a  frame  of  size  2JV_1.  Properties  of  the  dictionaries  that  will  be  used 
in  the  remainder  of  the  section  are  summarized  in  Table  3.1. 
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I.  DFT-like  set  given  by  (2.9) 

Advantages: 

•  Inner  products  can  be  found  with  an  FFT-like  algorithm 

•  No  need  to  store  dictionary 
Comment: 

•  Dictionary  elements  lie  in  the  intersection  of  the  unit  sphere  with  the 
plane  x1  =  -j=. 

IF  Maximally  spaced  points  on  the  unit  sphere  from  [16] 

Disadvantages: 

•  Dictionary  must  be  stored. 

•  Known  only  for  N  =  3,  4,  5,  and  M  <  130. 

III.  Corners  of  hypercube 

Advantages: 

•  Inner  products  can  be  found  with  additions  and  subtractions  only 
(no  multiplications). 

•  Can  choose  V  without  calculating  any  inner  products.  (Signs  of 
components  of  Rif  determine  which  dictionary  element  should  be  chosen.) 

•  No  need  to  store  dictionary 
Disadvantage: 

•  No  flexibility  in  choice  of  M  for  fixed  N. 


Table  3.1:  Summary  of  dictionaries  used  in  compression  experiments 

3.4.2  Experimental  Results 

The  experiments  all  involve  quantization  of  a  zero  mean  Gaussian  AR  source  with  correlation 
coefficient  p  =  0.9.  Source  vectors  are  generated  by  forming  blocks  of  N  samples.  Rate  is 
measured  by  summing  the  (scalar)  sample  entropies  of  k0}  £q,  .  .  .,  £y_i  and  So,  aq,  .  .  .,  dyT i, 
where  p  is  the  number  of  iterations  of  the  algorithm. 

Figure  3.18  shows  the  D(R )  points  obtained  using  Method  I  with  N  =  9.  The  dictionary 
redundancy  ratio  is  r  =  8.  The  dotted  curves  correspond  to  varying  p,  with  the  leftmost 
and  rightmost  curves  corresponding  to  p  =  1  and  p  =  3,  respectively.  The  points  along  each 
dotted  curve  correspond  to  various  values  of  A.  The  solid  curve  shows  the  performance  of 
independent  quantization  in  each  dimension. 

The  lower  boundary  of  the  region  bounded  below  by  one  or  more  dotted  curves  is  the 
best  R-D  performance  that  can  be  achieved  with  this  dictionary  through  the  choice  of  p 
and  A.  The  simulation  results  show  that  matching  pursuit  performs  as  well  or  better  than 
independent  scalar  quantization  for  rates  up  to  about  2.2  bits  per  source  sample. 

The  simulation  described  above  does  not  explore  the  significance  of  the  r  parameter. 
Simulations  as  above  were  performed  with  r  ranging  from  1  to  256.  Redundancy  factors 
between  2  and  8  resulted  in  the  best  performance. 

A  large  fraction  of  the  rate  comes  from  coding  the  indices.  In  an  attempt  to  exploit  the 
fact  that  pkt  and  are  often  nearly  orthogonal,  experiments  were  also  performed  where 
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Figure  3.18:  R-D  performance  of  matching  pursuit  quantization  with  one  to  three  iterations. 
(N  =  9,  r  =  8,  dictionary  of  type  I.) 


a  single  entropy  code  was  applied  for  (A0,  Ay,  .  .  .  ,  Ap_ i).  We  refer  to  this  as  vector  entropy 
coding  of  the  indices.  The  entropy  coding  of  the  coefficients  remained  scalar.  Figure  3.19 
shows  the  results  of  experiments  with  a  dictionary  of  type  II.  A  dictionary  of  size  M  =  8 
is  used  in  ffi4.  The  dashed  curve  results  from  using  matching  pursuit  with  scalar  entropy 
coding  of  the  indices.  The  dash-dot  curve  shows  the  improvement  resulting  from  vector 
entropy  coding  of  the  indices.  The  “knees”  in  these  curves  correspond  to  rates  at  which  the 
optimal  number  of  iterations  changes.  For  comparison,  the  solid  curve  gives  the  performance 
of  scalar  quantization  with  scalar  entropy  coding.  Replacing  the  scalar  entropy  coding  by 
vector  entropy  coding  gives  the  dotted  curve. 

At  rates  up  to  about  1.4  bits  per  source  sample,  matching  pursuit  quantization  out¬ 
performs  scalar  quantization,  even  with  vector  entropy  coding.  (At  these  rates,  the  index 
entropy  coding  method  is  immaterial  because  it  is  best  to  have  only  one  iteration.)  Com¬ 
paring  to  simple  scalar  quantization  with  scalar  entropy  coding,  matching  pursuit  performs 
about  as  well  or  better  over  the  range  of  rates  considered,  up  to  3.5  bits  per  source  sample. 

This  simulation  shows  that  vector  entropy  coding  of  indices  gives  modestly  improved 
performance  at  high  rates.  At  high  rates  it  may  at  first  appear  that  independent  quantization 
with  vector  entropy  coding  is  far  superior  to  other  methods,  but  we  must  consider  the 
complexity  involved  in  the  entropy  coding.  Consider  operation  at  2  bits/sample.  The  optimal 
number  of  matching  pursuit  iterations  is  two,  so  the  vector  entropy  code  for  the  indices  has 
82  =  64  symbols.  The  entropy  codes  for  q0  and  op  have  20  and  6  symbols,  respectively. 
On  the  other  hand,  the  vector  entropy  code  for  the  independently  quantized  vectors  has 
144  =  38416  symbols.  Thus  with  limited  computational  resources,  the  matching  pursuit 
quantizer  may  be  the  best  choice. 

Figure  3.20  shows  simulations  results  using  the  typeTII  dictionary  with  N  =  8  (M  =  2' ). 
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Figure  3.19:  Simulation  results  for  N  =  4,  M  =  8  with  dictionary  of  type  II. 

The  curve  types  have  the  same  correspondence  as  in  Figure  3.19;  the  results  are  qualitatively 
similar. 


3.4.3  A  Few  Possible  Variations 


The  experiments  of  the  previous  subsection  are  the  tip  of  the  iceberg  in  terms  of  the  pos¬ 
sible  design  choices.  In  this  subsection,  a  few  possible  variations  are  presented  along  with 
plausibility  arguments  for  their  application. 

An  obvious  area  to  study  is  the  design  of  dictionaries.  For  static,  untrained  dictionaries, 
issues  of  interest  include  not  only  R-D  performance,  but  also  storage  requirements,  complex¬ 
ity  of  inner  product  computation,  and  complexity  of  largest  inner  product  search. 

Looking  at  the  dictionary  design  problem  from  a  VQ  standpoint,  the  first  impulse  is  to 
train  the  dictionary  using  given  training  data.  Davis  [7,  Ch.  8]  has  applied  a  Lloyd-type 
algorithm  to  optimize  a  dictionary  to  minimize 


D  =  E 


L- 1 

/  X!  Q*FA 
8  =  0 


for  some  fixed  L.  We  would  be  interested  in  the  case  where  the  coefficients  are  quantized  and 
the  minimization  is  of  D  +  A i?,  where  R  is  a  rate  measure  and  A  is  a  Lagrange  multiplier. 
The  result  of  such  an  optimization  must  have  worse  performance  than  a  general  entropy- 
constrained  VQ  design  because  the  matching  pursuit  algorithm  imposes  a  constraint  on  the 
codebook  structure.  However,  the  codebook  structure  may  provide  computational  advan¬ 
tages,  so  this  is  worthy  of  investigation. 

Another  possibility  in  dictionary  design  is  to  adapt  the  dictionary  by  augmenting  it  with 
samples  from  the  source.  (Dictionary  elements  might  also  bq  deleted  or  adjusted.)  This 
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Figure  3.20:  Simulation  results  for  N  =  8  with  dictionary  of  type  III. 

would  be  in  the  spirit  of  the  Lempel-Ziv  algorithm.  The  decoder  would  have  to  be  aware 
of  changes  in  the  dictionary,  but  depending  on  the  nature  of  the  adaptation,  this  may  come 
without  a  rate  penalty. 

There  is  no  a  priori  reason  to  use  the  same  dictionary  at  every  iteration.  Given  a  p 
iteration  estimate,  the  entropy  of  kp  becomes  a  limiting  factor  in  adding  the  results  of  an 
additional  iteration.  To  reduce  this  entropy,  it  might  be  useful  to  use  coarser  dictionaries  as 
the  iterations  proceed. 

In  our  experiments,  we  averaged  results  for  cjuantizing  many  samples  with  some  fixed 
number  of  iterations.  Instead  of  having  a  fixed  number  of  iterations,  it  may  be  useful  to 
use  a  stopping  criterion  based  on  the  energy  of  the  residue.  This  would  create  a  guaranteed 
upper  bound  on  the  error  and  might  have  a  favorable  impact  in  an  R-D  sense. 

The  experimental  results  that  have  been  presented  are  based  on  entropy  coding  each  o] 
independently  of  the  indices,  which  are  in  turn  codec!  either  separately  or  as  a  vector.  There 
are  at  least  three  other  ways  to  entropy  code: 

1.  Separately  code  the  pair  (/y,  q8)  for  each  i; 

2.  Jointly  code  all  of  the  indices  and  jointly  code  all  of  the  coefficients; 

3.  Jointly  code  all  of  the  indices  and  coefficients  together. 

Joint  entropy  coding  of  vectors  increases  complexity  and,  because  of  problems  of  statistical 
significance,  makes  simulating  very  time-consuming.  A  final  coding  variation,  which  was 
mentioned  in  §3.3.3,  is  to  discard  the  indices  that  correspond  to  zero  cpiantized  coefficients. 
This  should  give  a  modest  reduction  in  rate. 

For  a  broad  class  of  source  distributions,  the  distributions  of  the  o,-’ s  will  have  some 
common  properties  because  they  are  similar  to  order  statistics.  For  example*  the  probability 
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density  of  a0  will  be  small  near  zero.  This  could  be  exploited  in  quantizer  design  in  future 
work.  Finally,  rate-distortion  performance  might  be  improved  by  using  quantizers  with 
overload  regions. 


Chapter  4 
Conclusions 


This  report  has  considered  the  effect  of  coefficient  quantization  in  overcomplete  expansions. 
Two  classes  of  overcomplete  expansions  were  considered:  fixed  (frame)  expansions  and  ex¬ 
pansions  that  are  adapted  to  the  particular  source  sample,  as  given  by  matching  pursuit. 

We  first  considered  frame  expansions.  In  Theorem  2.2,  we  proved  that  a  certain  type  of 
sequence  of  frames  approaches  a  tight  frame.  Along  with  being  an  interesting  result  in  its 
own  right,  this  may  help  in  understanding  asymptotic  properties  of  frame  expansions. 

We  defined  the  concept  of  consistency.  Along  with  giving  computational  methods  for 
finding  consistent  estimates,  we  asserted  that  consistency  is  the  essential  criterion  for  good 
reconstruction.  For  an  expansion  with  redundancy  r,  we  proved  that  any  reconstruction 
method  will  give  MSE  that  can  be  lower  bounded  by  an  0(l/r2)  expression.  Backed  by 
experimental  evidence  and  a  proof  of  a  restricted  case,  we  conjecture  that  any  reconstruction 
method  that  gives  consistent  estimates  will  have  an  MSE  that  can  upper  bounded  by  an 
0(l/r2)  expression. 

After  reviewing  the  matching  pursuit  algorithm,  we  showed  that  it  shares  some  important 
properties  with  the  Karhunen-Loeve  transform  (KLT).  For  an  ellipsoidal  source  distribution, 
matching  pursuit  in  some  sense  finds  the  principal  axes.  Also,  it  gives  better  energy  com¬ 
paction  than  the  KLT. 

We  showed  that  the  partitions  generated  by  quantizing  the  coefficients  in  matching  pur¬ 
suit  are  very  intricate.  We  also  showed  that  consistency  is  an  issue  in  this  type  of  rep¬ 
resentation  and  gave  explicit  conditions  for  consistency.  The  potential  lack  of  consistency 
shows  that  even  though  matching  pursuit  is  designed  to  produce  a  linear  combination  to  esti¬ 
mate  a  given  source  vector,  optimal  reconstruction  in  the  presence  of  coefficient  quantization 
requires  a  nonlinear  algorithm. 

Finally,  we  considered  applying  matching  pursuit  as  a  general  vector  compression  method. 
The  overhead  in  using  this  method  is  coding  the  indices  of  the  dictionary  elements  used. 
Therefore,  in  choosing  a  dictionary  size  there  is  a  tradeoff  between  increasing  overhead  and 
enhancing  the  ability  to  closely  match  signal  vectors  with  a  small  number  of  iterations.  Since 
it  is  a  successive  approximation  method,  matching  pursuit  may  be  useful  in  a  multiresolution 
framework.  The  inherent  hierarchical  nature  of  the  representation  is  amenable  to  unequal 
error  protection  methods  for  transmission  over  noisy  channels. 

Matching  pursuit  acts  as  a  “universal  transform,”  giving  good  energy  compaction  without 
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knowledge  of  source  statistics.  This  method  gets  much  of  its  compression  gain  from  entropy 
coding.  Thus,  by  coupling  it  with  adaptive  and/or  universal  lossless  coding,  it  could  work 
well  as  an  adaptive  and/or  lossless  vector  compression  scheme.  We  make  no  optimality 
claims  and  do  not  address  the  issues  of  “redundancy”  and  “estimation  noise,”  as  defined 
in  the  universal  lossy  coding  literature.  Accordingly,  our  usage  of  “universal”  refers  to  the 
properties  of  the  transform  as  opposed  to  the  properties  of  the  entire  compression  system. 


Appendix  A 
Proofs 


A.l  Spherical  Coordinates  in  Arbitrary  Dimension 

Since  the  usage  of  spherical  coordinates  in  dimensions  greater  than  three  is  not  common,  a 
review  is  presented  here.  Spherical  coordinates  will  be  useful  in  the  proof  of  Theorem  2.2 
(§A.3). 

In  M3,  the  standard  way  to  define  a  transformation  between  rectangular  coordinates 
(x,  y,  z)  to  spherical  coordinates  (p,  9}  lu)  is  through 

x  =  pcos^sincu 
y  =  p  sin  9  sin  lu 
z  =  pcoscu, 

where  p  £  [0,  oo),  9  £  [0,  27t),  and  lu  £  [0,  7r],  It  is  instructive  to  notice  that  to  go  from 
polar  coordinates 


x  =  p  cos  9 
y  =  p  sin  9 

to  spherical  coordinates,  one  defines  a  new  angular  variable  lu  £  [0,  tt]  ,  multiplies  the  existing 
coordinate  definitions  by  sincu,  and  sets  the  new  coordinate  variable  z  to  p  cos  lu.  Continuing 
this  process  inductively  gives  spherical  coordinates  in  arbitrary  dimension. 

For  N  >  3,  define  spherical  coordinates  (p,  0,  c^i,  .  .  .  ,  cujv- 2)  implicitly  from  rectangular 
coordinates  (ap,  x2}  .  .  .  ,  xjv)  as  follows: 


X\ 

=  p  cos  6  sin  lu\  sin  lu2  .  . 

.  sincujv-2 

X2 

=  p  sin  9  sinc^i  sin  lu2  .  . 

.  sinayv-2 

x3 

=  pcos  lu\  sincu2  .  .  .  sincujv-2 

X4 

=  pcos  lu2  sin  lu3  ...  sin  cujv-2 

XN-1 

=  p  cos  cujv-3  sin  cujv-2 

Xn 

=  pCOSCUjV-2 
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Here  p  £  [0,  oo),  6  £  [0,  2i r),  and  Ui  £  [0,  7 r]  for  i  =  1,  2,  .  .  . ,  N  —  2.  Note  that  this  can  be 
viewed  as  a  way  to  parameterize  vectors  of  length  p  in  RN . 

By  direct  calculation,  the  Jacobian  of  the  transformation  is 


d(xj,x2,  ■  ■  -,xN) 

d(p,0,cj!, .  .  .  ,lun_2) 


N- 1  •  -2  •  N- 2 

=  p  smwi  sm  w2  .  . .  sin  cujv- 2- 


(A.l) 


A. 2  Proposition  2.1 

A  condition  for  $  to  span  H  is  that 

(/,  ifk)  =  0  V  k  £  K  =P  f  =  0. 

This  is  immediate  from  (2.4).  It  remains  to  show  that  the  are  orthonormal.  For  any 

k  £  K, 

\Wk\\2  =  Kw,  Pe)\2  =  \\pk\\4  +  Kw,  Pe)\2  ■ 

tei<  iei<\{k} 

Now  ||(/?fc||  =  1  implies  {pk,  Pi)  =  0  for  all  £  ^  k. 
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where  a2  =  E\(ipk)f\  and  /i4  =  E\(<pk)f\  [27,  §8-1].  For  the  off-diagonal  elements  (i  ^  j), 


E 

Var 


<¥f*f»- 


0 

—  F  \(m  \2(m  t2] 


(A.4) 

(A.5) 


Noting  that  a2  and  p4  are  independent  of  M,  (A. 3)  shows  that  Var  ^(■A-F*F)84j  — >  0  as 
M  — >  oo,  so  (jjF*F)n  — >  a2  in  the  mean-squared  sense  [27,  §8-4],  Similarly,  (A.4)  and  (A.5) 
show  that  for  i  ^  j,  {  — >  0  in  the  mean-squared  sense.  This  completes  the  proof, 

provided  a2  =  A. 

We  now  derive  explicit  formulas  (depending  on  N )  for  <r2,  /i4,  and  E  [(^it)i  (^it)j]  •  For 
notational  convenience,  we  omit  the  subscript  k  and  use  subscripts  to  identify  the  components 
of  the  vector. 

To  compute  expectations,  we  need  an  expression  for  the  joint  probability  density  of 
■  ■  ■  tPn)-  Denote  the  n-dimensional  sphere  centered  at  the  origin  with  radius  p  by 
Sp.  Since  p>  is  uniformly  distributed  on  the  surface  of  ,  the  p.  d.  f.  of  ip  is  given  by 

f(ip)  =  — ,  V  f  e  dS4,  (A. 6) 

cjv 

where  cjv  is  the  surface  area  of  .  We  can  compute  cjv  as  follows: 


Cjv  = 


dA  where  dA  is  a  differential  area  element 

95f 

27T  /*  7T  /*7T 


PZ7T  p7T  p7T  p7T 

/  /  /  **•  /  sinc^i  sin2  lu2  . . .  sin^-2  lun-2  dujjsj-2  •  •  •  dcoidO  (A. 7) 

Jo  Jo  Jo  Jo 


/*27T  \  /  p7T  \  /  p7 T 

j  d6j  yj  Anuiduij  yj  sin2  uj2  du? 


sin  cujv-2  dcujv-2  (A. 8) 


In  (A. 7)  we  have  parameterized  the  surface  of  the  sphere  with  spherical  coordinates  and  used 
the  differential  area  segment  given  by  (A.l).  Using 

[v  ■  2n  a, a  1  •  3  •  5  •  •  •  (2n  —  1)  , 

/  sin2  6  dd  =  - - — 7T  and 

Jo  2  •  4  •  •  •  (2n) 

r  sin2n+1  e  de  =  2  2'4"'(2n) 


jo  1  •  3  •  5  •  •  •  (2n  +  1) 

we  can  simplify  (A. 8)  to  get  the  following  familiar  result  [3,  §1.4]: 

NttnF  A^V^-1)/2^)! 


Cjv  = 


(N/  2)! 


V! 


(A.9) 


Using  (A. 6),  we  can  make  the  following  calculation: 
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f  — l 0%  dA  where  dA  is  a  differential  area  element 

Jdsd  cN  JV 

P7T 

/  (cos  cujv-2)2  sin  sin2  u2  .  . .  sin^-2  ujn- 2  dcujv- 2  •  •  •  duidd(A.lO) 
Jo 

=  —  (  /  dd  j  (^  j  sinc^i  dui^j  (^  j  sin2  uj2  du2 
(^J  sin^-3  cujv-3  do2jv-3^  cos2  ujn. 


Cjv 

1*27 r  /*7T 

Cjv  do  do 

1  /  /*27T 


Cjv  WO 


V-2 


sin^  2  cujv-2  dcujv-2 


-1 


,JV-2 


•  iV  — 1 

COS  U2j\r 2  Sin  02^  —  2 

iv 


I  1  f7T  •  N  —  2  7 

+  Tv  Jo  sm  ^N-2  dlON—2 


fo  sin  tujv-2  dcujv-2 


CUJV-2  d(jJN-2J 

2  dcUjV-2^ 

(4.11) 

(A. 12) 

1 

iV 


In  this  calculation,  (A. 10)  results  from  using  spherical  coordinates  and  (A. 11)  follows  from 
substituting  (A. 8)  and  cancelling  like  terms.  The  simplification  (A.  12)  is  due  to  a  standard 
integration  formula  [30,  #323].  Similar  calculations  give 


hr  =  E  = 


and,  for  i  #  j, 


e  [rfd  = 


N(N  +  2) 

1 

A#VT2)' 


(A.13) 


(A.14) 


A. 4  Proposition  2.5 

Subtracting 


M 


from 


gives 


/  =  Z((/>  <Pk)  +  Pk)pk 

k= 1 

M 

f  =  Z(/>  Pk)pk 


k= 1 


M 


f  ~  f  = 

k= 1 


Then  we  can  calculate 


MSE  =  E 

=  E 


=  E 


/  M 

Z  I  (  Z  PkEk 


M 

Z  PkPk 

k= 1 

M  \ 


L  \;=i 


\fc=i 


APPENDIX  A.  PROOFS 


52 


< 


< 


-MM 

e  EE  ^pk 

-i= 1  k= 1 
M  M 

E  E  ^kv2pi*pk 

2=1  k= 1 
M 

^2E  ll^ll2 

k= 1 
M 

fc=i 
M 

<*2  Zinur1!  M 


k= 1 


Ma2 

Mu2 

~A 


*  7?\-l 


(F’F) 


( A .  1 5 ) 


(A. 16) 

(A.17) 

(A-18) 
(A. 19) 


where  (A.  15)  results  from  evaluating  expectations  using  the  conditions  on  /3,  (A.  16)  uses 
(2.11),  (A. 18)  uses  the  normalization  of  the  frame,  and  (A. 19)  follows  from  (2.10). 

If  $  is  a  tight  frame,  equality  holds  in  (A.17)  and  (A. 19).  Also,  due  to  the  normalization 
of  the  frame,  A  =  Thus 


MSE 


Ma 2 
(. M/N )2 


N2a2  _  Na2 
M  r 


Appendix  B 

Frame  Expansions  and  Hyperplane 
Wave  Partitions 


This  appendix  gives  an  interpretation  of  frame  coefficients  as  measurements  along  different 
directions.  This  allows  us  to  understand  the  partitioning  of  RN  induced  by  frame  coefficient 
quantization  without  appealing  to  intersections  with  the  partitioning  of  Mm.  We  will  also 
touch  on  efficient  coding  of  frame  coefficients. 

Given  a  frame  $  =  the  kth  component  of  y  =  Fx  is  y k  =  (x,  pk)-  Thus  y k  is  a 

measurement  of  x  along  pj~.  We  can  thus  interpret  y  as  a  vector  of  M  “measurements”  of  x 
in  directions  specified  by  $.  Notice  that  in  the  original  basis  representation  of  x,  we  have  N 
measurements  of  x  with  respect  to  the  directions  specified  by  the  standard  basis.  Each  of  the 
N  measurements  is  needed  to  fix  a  point  in  RN .  On  the  other  hand,  the  M  measurements 
given  in  y  have  only  N  degrees  of  freedom. 

Now  let’s  suppose  y  is  scalar-quantized  to  give  y  by  rounding  each  component  to  the 
nearest  multiple  of  A.  Since  y k  specifies  the  measurement  of  a  component  parallel  to  pk} 
Vk  =  (*  +  f)A  specifies  a  hyperplane  (A  — 1  dimensional  manifold)  perpendicular  to  p^.  Thus 
quantization  of  y &  gives  a  set  of  parallel  hyperplanes  spaced  by  A,  called  a  hyperplane  single 
wave.  The  M  hyperplane  single  waves  give  a  partition  with  a  particular  structure  called  a 
hyperplane  wave  partition  [35]. 

Examples  of  hyperplane  wave  partitions  are  shown  in  Figure  B.l.  Figure  B.l(a)  shows 
a  frame  in  R2  composed  of  three  vectors.  Suppose  x  £  M2  is  specified  by  quantized  inner 
products  with  each  of  the  three  frame  vectors.  The  quantization  of  the  inner  product  with 
the  black  vector  gives  the  black  hyperplane  single  wave.  Similarly  for  the  red  and  blue  frame 
vectors.  Figure  B.l(b)  gives  an  example  with  M  =  5. 

We  can  now  interpret  increasing  the  redundancy  r  of  a  frame  as  increasing  the  number 
of  directions  in  which  x  is  measured.  It  is  well-known  that  MSE  is  proportional  to  A2. 
Section  2.2.4  presents  a  conjecture  that  MSE  is  proportional  to  1/r2.  This  conjecture  can  be 
recast  as  saying  that,  asymptotically,  increasing  directional  resolution  is  as  good  as  increasing 
coefficient  resolution.  This  is  shown  in  Figure  B.2.  The  initial  partition  is  in  black,  increasing 
coefficient  resolution  is  shown  in  blue  and  increasing  directional  resolution  is  shown  in  red. 

In  §2.2.5  it  was  mentioned  that  coding  each  component  of  y  separately  is  inefficient  when 
r  1.  This  can  be  explained  by  reference  to  Figure  B.l.  Specifying  y i  and  y2  defines  a 
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Figure  B.l:  Examples  of  hyperplane  wave  partitions  in  "Sr;  (a)  M 


3.  (b)  M  =  5. 


Figure  B.2:  Two  ways  to  refine  a  partition:  (a)  Increasing  coefficient  resolution,  (b)  Increas¬ 
ing  directional  resolution. 
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parallelogram  within  which  x  lies.  Then  there  are  a  limited  number  of  possibilities  for  y3. 
(In  Figure  B.l(a),  there  are  exactly  two  possibilities.  In  Figure  B.l(b),  there  are  three  or 
four  possibilities.)  Then  with  y 4,  y2,  and  y3  specified,  there  are  yet  fewer  possibilities  for  y4. 
If  this  is  exploited  fully  in  the  coding,  the  bit  rate  should  only  slightly  exceed  the  logarithm 
of  the  number  of  partition  cells. 


Appendix  C 

Lattice  Quantization  Through  Frame 
Operations 


A  lattice  A  is  a  set  of  points  consisting  of  sums  of  the  form  J2%=i£kVk,  where  the  G  are 
integers  and  the  vectors  iq,  .  ..,  njv  are  called  a  basis  of  the  lattice  [3].1  A  lattice  vector 
quantizer  is  a  nearest-neighbor  quantizer  whose  reproduction  values  form  a  lattice.  This 
appendix  establishes  a  relationship  between  lattice  vector  quantization  and  quantized  frame 
representations.  We  will  see  that  in  certain  circumstances  lattice  vector  quantization  can  be 
achieved  by  quantized  frame  expansion  followed  by  operations  on  discrete  variables. 

Given  a  lattice  A,  the  basis  is  not  unique.  For  example,  given  a  basis  {iq,  u2,  .  .  .  ,  vn },  we 
can  form  another  basis  through  uq  =  Try,  where  T  is  any  invertible  element  of  ZnXn.  Without 
loss  of  generality,  we  will  assume  that  the  basis  {iq,  u2,  .  .  . ,  vn }  is  one  that  minimizes  the 
norms  of  the  basis  elements  such  that 

i 

for  all  nontrivial  sets  of  integer  coefficients  and  for  all  k.2  Such  a  basis  exists  because  if  (C.l) 
does  not  hold  for  some  k,  v ^  can  be  replaced  in  the  basis  by  °ivi- 

We  would  like  to  describe  the  partition  cells  of  the  lattice  vector  quantizer  associated  with 

A.  Since  a  lattice  is  invariant  to  any  shift  that  moves  the  origin  to  another  lattice  point,  the 

Voronoi  cells  are  congruent.  For  notational  convenience,  we  will  consider  the  region  mapped 
to  the  origin  by  the  quantizer.  Nearest-neighbor  encoding  implies  that  the  region  mapped 
to  the  origin  is3 

Ro  =  {x  e  RN  :  INI  <  ||a;  -  A||  V  A  G  A  \  {0}}  .  (C.2) 

This  is  an  infinite  number  of  half-space  constraints.  It  is  shown  in  [12,  §VI.  A.]  that  by 

removing  redundant  constraints  (those  corresponding  to  hyperplanes  far  from  the  origin), 
(C.2)  can  be  replaced  by  a  finite  number  of  constraints.  The  number  of  remaining  constraints 

1  It  is  implicit  that  the  origin  is  an  element  of  the  lattice. 

2 This  does  not  uniquely  describe  the  basis.  It  is  equivalent  to  choosing  a  basis  which  minimizes  the 
surface  area  of  the  fundamental  parallelotope.  (The  volume  of  the  fundamental  parallelotope  is  fixed  by  A.) 
See  [3,  §1.2  of  Ch.  1], 

3The  boundaries  can  be  arbitrarily  defined. 


>  IN 


(C.l) 
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depends  on  the  lengths  of  the  basis  vectors;  enforcing  (C.l)  minimizes  the  number  of  hyper¬ 
plane  constraints.  Denote  the  minimum  number  of  half-space  constraints  to  describe  R0  by 
L.  There  exists  correspondingly  Al  C  A  such  that 

Ro  =  £  Kw  :  \\x\\  <  \\x  —  A 1 1  V  A  £  . 

By  symmetry,  A  £  Al  implies  —A  £  Al-  Thus  the  constraints  are  in  the  form  of  L/ 2  pairs 
of  parallel  hyperplanes. 

To  describe  the  entire  lattice  partition  requires  not  only  the  L  hyperplanes,  but  also  those 
hyperplanes  translated  to  every  lattice  point.  For  some  lattices,  some  of  the  hyperplanes  will 
coincide,  resulting  in  a  hyperplane  wave  partition.  In  these  cases,  the  lattice  VQ  partition 
cells  are  unions  of  hyperplane  wave  partition  cells,  so  lattice  VQ  can  be  achieved  by  a 
quantized  frame  expansion  followed  by  the  discrete  operation  of  cell  unioning. 

The  familiar  hexagonal  tiling  of  M2  is  an  example  of  a  lattice  VQ  partitioning  that  can 
be  derived  from  a  hyperplane  wave  partition.  Figure  C.l  shows  the  lattice  generated  by 
Vi  =  [x/3  1]T  and  v2  =  [0  2]T.  In  this  case,  discarding  remote  hyperplanes  as  in  [12, 

§VI.  A.]  leaves  six  half-space  constraints  for  R0.  Furthermore, 

A6  =  {ui,  v2,  V2  -  Cl,  -Cl,  -v2,  vx  -  v2}. 


The  solid,  dashed,  and  dotted  curves  correspond  to  the  nearest-neighbor  conditions  for  ±iq, 
±n2,  and  ±(iq  —  v2),  respectively.  The  hyperplane  wave  partition  shown  in  Figure  C.l  is 
equivalent  to  that  generated  by  a  quantized  frame  expansion  with 


V2  Vi  —  V2  Vi  I 

2~’  2  ’  ~Y) 

and  A  =  1.  (The  choice  of  $  is  not  unique.) 

The  cells  in  the  hyperplane  wave  partition  are  equilateral  triangles.  By  joining  the  cells 
in  the  hyperplane  wave  partition  in  groups  of  six,  one  generates  the  desired  lattice  partition 
of  M2.  For  concreteness,  the  sequence  of  operations  is  shown  in  Figure  C.2.  T  is  a  frame 
expansion  by  multiplication  with 


T 


0  1 

y/3  _  1 

2  2 
yh  _  i 
2  2  . 


Q  represents  a  uniform  quantizer  which  outputs  the  odd  multiple  of  |  nearest  to 


Hence 


y  c 


2k  +  1 


:  k  £ 


Let 
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Figure  C.l:  A  lattice  in  ffi2  shown  with  the  corresponding  half-space  constraints  for  nearest- 
neighbor  encoding. 


Figure  G.2:  Block  diagram  for  hexagonal  lattice  quantization  of  ffi2  through  scalar  quantiza¬ 
tion  and  discrete  operations. 


Block  g  represents  a  selection  function  that  forms  groups  of  six  cells  from  the  hyperplane 
wave  partition  associated  with  T  and  Q.  Denote  the  output  of  g  by  s  =  [sq  s2]r  G  Z2.  Then 
s  is  determined  by  the  constraints 


and 


2si  +  S-2 
Si  +  2s2 

U  = 


■5i 

v  — 

y  = 

■52 

—  ^1  —  S-2 

=  0 

(mod  3) 

=  0 

(mod  3). 

A 

"V3 

-A 

i 

sr?0 

Finally,  x  =  Us,  where 
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